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(a) the following amino acid sequence region (C-terminus to N-terminus orientation) and/or (b) the following nucleic acid sequence 
region (3 1 to 5* orientation) transversing the transmembrane-6 (TM6) and intracellular loop-3 (IC3) regions of the GPCR: (a) P ] AA15 
X and/or (b) P™ 6on (AA-codon)i5 X CO don, respectively. In a most preferred embodiment, P 1 and P<**»on are endogenous proline and an 
endogenous nucleic acid encoding region encoding proline, respectively, located within TM6 of the non-endogenous GPCR; AA15 and 
(AA-codon)is are 15 endogenous amino acid residues and 15 codons encoding endogenous amino acid residues, respectively; and X and 
Xcodon are non-endogenous lysine and a non-endogenous nucleic acid encoding region encoding lysine, respectively, located within IC3 
of the non-endogenous GPCR. Because it is most preferred that the non-endogenous human GPCRs which incorporate these mutations 
are incorporated into mammalian cells and utilized for the screening of the candidate compounds, the non-er dogenous human GPCR 
incorporating the mutation need not be purified and isolated per se (i.e., these are incorporated within the cellular membrane of a mammalian 
cell), although such purified and isolated non-endogenous human GPCRs are well within the purview of this disclosure. 



BNSDOCID: <WO 0022129A1 J_> 




FOR THE PURPOSES OF INFORMATION ONLY 
Codes used to identify States party to the PCT on the front pages of pamphlets publishing international applications under the PCT. 



AL 


Albania 


ES 


Spain 


LS 


Lesotho 


SI 


Slovenia 


AM 


Armenia 


FI 


Finland 


LT 


Lithuania 


SK 


Slovakia 


AT 


Austria 


FR 


France 


LU 


Luxembourg 


SN 


Senegal 


AU 


Australia 


GA 


Gabon 


LV 


Latvia 


sz 


Swaziland 


AZ 


Azerbaijan 


GB 


United Kingdom 


MC 


Monaco 


TD 


Chad 


BA 


Bosnia and Herzegovina 


GE 


Georgia 


MD 


Republic of Moldova 


TG 


Togo 


BB 


Barbados 


Gil 


Ghana 


MG 


Madagascar 


TJ 


Tajikistan 


BE 


Belgium 


GN 


Guinea 


MK 


The former Yugoslav 


TM 


Turkmenistan 


BF 


Burkina Faso 


GR 


Greece 




Republic of Macedonia 


TR 


Turkey 


BG 


Bulgaria 


HU 


Hungary 


ML 


Mali 


TT 


Trinidad and Tobago 


BJ 


Benin 


IE 


Ireland 


MN 


Mongolia 


UA 


Ukraine 


BK 


Bra2i1 


IL 


Israel 


MK 


Mauritania 


UC 


Uganda 


BY 


Belarus 


IS 


Iceland 


MW 


Malawi 


US 


United States of America 


CA 


Canada 


IT 


Italy 


MX 


Mexico 


uz 


Uzbekistan 


CF 


Central African Republic 


JP 


Japan 


NE 


Niger 


VN 


Vict Nam 


CG 


Congo 


KE 


Kenya 


NL 


Netherlands 


YU 


Yugoslavia 


CH 


Switzerland 


KG 


Kyrgyzstan 


NO 


Norway 


z\v 


Zimbabwe 


CI 


C6te d'lvoire 


KP 


Democratic People's 


NZ 


New Zealand 






CM 


Cameroon 




Republic of Korea 


PL 


Poland 






CN 


China 


KR 


Republic of Korea 


PT 


Portugal 






CU 


Cuba 


KZ 


Kazakstan 


RO 


Romania 






CZ 


Czech Republic 


LC 


Saint Lucia 


KU 


Russian Federal ion 






DE 


Germany 


LI 


Liechtenstein 


SD 


Sudan 






DK 


Denmark 


LK 


Sri Lanka 


SE 


Sweden 






EE 


Estonia 


LK 


Liberia 


SC. 


Singapore 







BNSDOCID: <WO 0022129A1 I > 



WO 00/22129 PCT/US99/23938 



NON-ENDOGENOUS, CONSTITUTIVELY ACTIVATED 
HUMAN G PROTEIN-COUPLED RECEPTORS 

The benefits of commonly owned U.S. Serial Number 09/170,496, filed 
October 13, 1998, U.S. Serial Number 08/839, 449 filed April 14, 1997 (now abandoned), 
5 U.S. Serial Number 09/060,188, filed April 14, 1998; U.S. Provisional Number 60/090,783, 
filed June 26, 1998; and U.S. Provisional Number 60/095,677, filed on August 7, 1998, are 
hereby claimed. Each of the foregoing applications are incorporated by reference herein in 
their entirety. 

FIELD OF THE INVENTION 

10 The invention disclosed in this patent document relates to transmembrane 

receptors, and more particularly to human G protein-coupled receptors (GPCRs) which have 
been altered such that altered GPCRs are constitutively activated. Most preferably, the altered 
human GPCRs are used for the screening of therapeutic compounds. 
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BACKGROUND OF THE INVENTION 

Although a number of receptor classes exist in humans, by far the most abundant and 
therapeutically relevant is represented by the G protein-coupled receptor (GPCR or GPCRs) class. 
It is estimated that there are some 100,000 genes within the human genome, and of these, 
5 approximately 2% or 2,000 genes, are estimated to code for GPCRs. Of these, there are 
approximately 100 GPCRs for which the endogenous ligand that binds to the GPCR has been 
identified. Because of the significant time-lag that exists between the discovery of an endogenous 
GPCR and its endogenous ligand, it can be presumed that the remaining 1,900 GPCRs will be 
identified and characterized long before the endogenous ligands for these receptors are identified. 
1 0 Indeed, the rapidity by which the Human Genome Project is sequencing the 1 00,000 human genes 
indicates that the remaining human GPCRs will be fully sequenced within the next few years. 
Nevertheless, and despite the efforts to sequence the human genome, it is still very unclear as to 
how scientists will be able to rapidly, effectively and efficiently exploit this information to 
improve and enhance the human condition. The present invention is geared towards this 
15 important objective. 

Receptors, including GPCRs, for which the endogenous ligand has been identified are 
referred to as "known" receptors, while receptors for which the endogenous ligand has not been 
identified are referred to as "orphan" receptors. This distinction is not merely semantic, 
particularly in the case of GPCRs. GPCRs represent an important area for the development of 
20 pharmaceutical products: from approximately 20 of the 100 known GPCRs, 60% of all 
prescription pharmaceuticals have been developed. Thus, the orphan GPCRs are to the 
pharmaceutical industry what gold was to California in the late 19* century - an opportunity to 
drive growth, expansion, enhancement and development. A serious drawback exists, however, 
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with orphan receptors relative to the discovery of novel therapeutics. This is because the 
traditional approach to the discovery and development of pharmaceuticals has required access to 
both the receptor and its endogenous ligand. Thus, heretofore, orphan GPCRs have presented the 
art with a tantalizing and undeveloped resource for the discovery of pharmaceuticals. 
5 Under the traditional approach to the discovery of potential therapeutics, it is generally the 

case that the receptor is first identified. Before drug discovery efforts can be initiated, elaborate, 
time consuming and expensive procedures are typically put into place in order to identify, isolate 
and generate the receptor's endogenous ligand - this process can require from between 3 and ten 
years per receptor, at a cost of about $5million (U.S.) per receptor. These time and financial 

10 resources must be expended before the traditional approach to drug discovery can commence. 
This is because traditional drug discovery techniques rely upon so-called "competitive binding 
assays" whereby putative therapeutic agents are "screened" against the receptor in an effort to 
discover compounds that either block the endogenous ligand from binding to the receptor 
("antagonists"), or enhance or mimic the effects of the ligand binding to the receptor ("agonists"). 

15 The overall objective is to identify compounds that prevent cellular activation when the ligand 
binds to the receptor (the antagonists), or that enhance or increase cellular activity that would 
otherwise occur if the ligand was properly binding with the receptor (the agonists). Because the 
endogenous ligands for orphan GPCRs are by definition not identified, the ability to discover novel 
and unique therapeutics to these receptors using traditional drug discovery techniques is not 

20 possible. The present invention, as will be set forth in greater detail below, overcomes these and 
other severe limitations created by such traditional drug discovery techniques. 

GPCRs share a common structural motif. All these receptors have seven sequences of 
between 22 to 24 hydrophobic amino acids that form seven alpha helices, each of which spans the 
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membrane (each span is identified by number, i.e., transmembrane- 1 (TM-1), transmebrane-2 
(TM-2), etc,)- The transmembrane helices are joined by strands of amino acids between 
transmembrane-2 and transmembrane-3, transmembrane-4 and transmembrane- 5 , and 
transmembrane-6 and transmembrane-7 on the exterior, or "extracellular' 1 side, of the cell 
5 membrane (these are referred to as "extracellular" regions 1, 2 and 3 (EC-1, EC-2 and EC-3), 
respectively). The transmembrane helices are also joined by strands of amino acids between 
transmembrane- 1 and transmembrane-2, transmembrane-3 and transmembrane-4, and 
transmembrane-5 and transmembrane-6 on the interior, or "intracellular" side, of the cell 
membrane (these are referred to as "intracellular" regions 1, 2 and 3 (IC-1, IC-2 and IC-3), 

10 respectively). The "carboxy" ("C") terminus of the receptor lies in the intracellular space within 
the cell, and the "amino" ("N") terminus of the receptor lies in the extracellular space outside of 
the cell. The general structure of G protein-coupled receptors is depicted in Figure 1 . 

Generally, when an endogenous ligand binds with the receptor (often referred to as 
"activation" of the receptor), there is a change in the conformation of the intracellular region that 

15 allows for coupling between the intracellular region and an intracellular "G-protein." Although 
other G proteins exist, currently, Gq, Gs, Gi, and Go are G proteins that have been identified. 
Endogenous ligand-activated GPCR coupling with the G-protein begins a signaling cascade 
process (referred to as "signal transduction"). Under normal conditions, signal transduction 
ultimately results in cellular activation or cellular inhibition. It is thought that the IC-3 loop as 

20 well as the carboxy terminus of the receptor interact with the G protein. A principal focus of this 
invention is directed to the transmembrane-6 (TM6) region and the intracellular-3 (IC3) region of 
the GPCR. 

Under physiological conditions, GPCRs exist in the cell membrane in equilibrium between 
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two different conformations: an "inactive" state and an "active" state. As shown schematically in 
Figure 2, a receptor in an inactive state is unable to link to the intracellular signaling transduction 
pathway to produce a biological response. Changing the receptor conformation to the active state 
allows linkage to the transduction pathway (via the G-protein) and produces a biological response. 
5 A receptor may be stabilized in an active state by an endogenous ligand or a compound 

such as a drug. Recent discoveries, including but not exclusively limited to modifications to the 
amino acid sequence of the receptor, provide means other than endogenous ligands or drugs to 
promote and stabilize the receptor in the active state conformation. These means effectively 
stabilize the receptor in an active state by simulating the effect of an endogenous ligand binding 
1 0 to the receptor. Stabilization by such ligand-independent means is termed "constitutive receptor 
activation." 

As noted above, the use of an orphan receptor for screening purposes has not been 
possible. This is because the traditional "dogma 1 * regarding screening of compounds mandates that 
the ligand for the receptor be known. By definition, then, this approach has no applicability with 

15 respect to orphan receptors. Thus, by adhering to this dogmatic approach to the discovery of 
therapeutics, the art, in essence, has taught and has been taught to forsake the use of orphan 
receptors unless and until the endogenous ligand for the receptor is discovered. Given that there 
are an estimated 2,000 G protein coupled receptors, the majority of which are orphan receptors, 
such dogma castigates a creative, unique and distinct approach to the discovery of therapeutics. 

20 Information regarding the nucleic acid and/or amino acid sequences of a variety of GPCRs 

is summarized below in Table A. Because an important focus of the invention disclosed herein 
is directed towards orphan GPCRs, many of the below-cited references are related to orphan 
GPCRs. However, this list is not intended to imply, nor is this list to be construed, legally or 
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otherwise, that the invention disclosed herein is only applicable to orphan GPCRs or the specific 
GPCRs listed below. Additionally, certain receptors that have been isolated are not the subject of 
publications per se; for example, reference is made to a G Protein-Coupled Receptor database on 
the "world-wide web" (neither the named inventors nor the assignee have any affiliation with this 
5 site) that lists GPCRs. Other GPCRs are the subject of patent applications owned by the present 
assignee and these are not listed below (including GPR3, GPR6 and GPR12; see U.S. Provisional 
Number 60/094879): 

Table A 



10 



15 



20 



25 



30 



35 



Receptor Name 


Publication Reference 


GPR1 


23 Genomics 609 (1994) 


GPR4 


14 DNA and Cell Biology 25 (1995) 


GPRS 


14 DNA and Cell Biology 25 (1995) 


GPR7 


28 Genomics 84(1995) 


GPR8 


28 Genomics 84(1995) 


GPR9 


184 J. Exp. Med. 963(1996) 


GPR10 


29 Genomics 335(1995) 


GPR15 


32 Genomics 462 (1996) 


GPR17 


70 J Neurochem. 1357 (1998) 


GPR18 


42 Genomics 462 (1997) 


GPR20 


187 Gene 75 (1997) 


GPR2! 


187 Gene 75 (1997) 


GPR22 


187 Gene 75 (1997) 


GPR24 


398 FEBS Lett. 253 (1996) 


GPR30 


45 Genomics 607 (1997) 


GPR31 


42 Genomics 519 (1997) 


GPR32 


50 Genomics 281 (1997) 


GPR40 


239 Biochem. Biophys. 




Res. Commun. 543 (1997) 


GPR41 


239 Biochem- Biophys. 




Res. Commun. 543 (1997) 


GPR43 


239 Biochem. Biophys. 




Res. Commun. 543 (1997) 


APJ 


136 Gene 355 (1993) 


BLR1 


22 Eur. J. Immunol 2759 (1992) 


CEPR 


231 Biochem. Biophys. 




Res. Commun. 651 (1997) 


EBI1 


23 Genomics 643 (1994) 


EBI2 


67 J. Virol. 2209(1993) 


ETBR-LP2 


424 FEBS Lett. 193 (1998) 


GPCR-CNS 


54 Brain Res. Mol. Brain Res. 152 (1998); 




45 Genomics 68(1997) 


GPR-NGA 


394 FEBS Lett. 325 (1996) 


H9 


386 FEBS Lett 219 (1996) 



BNSDOCID: <WO 00221 29A1_I. > 



4* 



WO 00/22129 



PCT/US99/23938 



- 7 



HBA954 



1261 Biochun. Biophys. Acta 121 (1995) 



HG38 


247 Biochem. Biophys. 
Res. Commun. 266 (1998) 


HM74 


5 Int. Immunol. 1239 (1993) 


OGR1 


35 Genomics 397 (1996) 


V28 


163 Gene 295 (1995) 



As will be set forth and disclosed in greater detail below, utilization of a mutational cassette to 
modify the endogenous sequence of a human GPCR leads to a constitutively activated version of 
the human GPCR. These non-endogenous, constitutively activated versions ofhuman GPCRs can 
be utilized, inter alia, for the screening of candidate compounds to directly identify compounds 
10 of, e.g., therapeutic relevance. 



SUMMARY OF THE INVENTION 

Disclosed herein is a non-endogenous, human G protein-coupled receptor comprising 
(a) as a most preferred amino acid sequence region (C-terminus to N-terminus orientation) 
and/or (b) as a most preferred nucleic acid sequence region (3' to 5' orientation) transversing 
15 the transmembrane-6 (TM6) and intracellular loop-3 (IC3) regions of the GPCR: 
(a) P 1 AA I5 X 

wherein: 

(1) P 1 is an amino acid residue located within the TM6 region of 
the GPCR, where P 1 is selected from the group consisting of (i) 

20 the endogenous GPCR's proline residue, and (ii) a non- 

endogenous amino acid residue other than proline; 

(2) A A, 5 are 1 5 amino acids selected from the group consisting of 
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10 



and/or 



15 wherein: 



20 



(3) 



(a) the endogenous GPCR's amino acids (b) non-endogenous 
amino acid residues, and (c) a combination of the endogenous 
GPCR's amino acids and non-endogenous amino acids, 
excepting that none of the 1 5 endogenous amino acid residues 
that are positioned within the TM6 region of the GPCR is 
proline; and 

X is a non-endogenous amino acid residue located within the 
IC3 region of said GPCR, preferably selected from the group 
consisting of lysine, hisitidine and arginine, and most 
preferably lysine, excepting that when the endogenous amino 
acid at position X is lysine, then X is an amino acid other than 
lysine, preferably alanine; 



(b) P**" (AA-codon) J5 X codon 

(1) P codon is a nucleic acid sequence within the TM6 region of the 
GPCR, where P codon encodes an amino acid selected from the 
group consisting of (i) the endogenous GPCR's proline residue, 
and (ii) a non-endogenous amino acid residue other than proline; 

(2) (AA-codon) J5 are 15 codons encoding 15 amino acids selected 
from the group consisting of (a) the endogenous GPCR's amino 
acids (b) non-endogenous amino acid residues and (c) a 
combination of the endogenous GPCR's amino acids and non- 
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endogenous amino acids, excepting that none of the 15 
endogenous codons within the TM6 region of the GPCR encodes 
a proline amino acid residue; and 
(3) Xcodon is a nucleic acid encoding region residue located within the 
5 IC3 region of said GPCR, where encodes a non-endogenous 

amino acid, preferably selected from the group consisting of 
lysine, hisitidine and arginine, and most preferably lysine, 
excepting that when the endogenous encoding region at position 
Xcodon encodes the amino acid lysine, then X^^on encodes an amino 
1 0 acid other than lysine, preferably alanine. 

The terms endogenous and non-endogenous in reference to these sequence cassettes are relative 
to the endogenous GPCR. For example, once the endogenous proline residue is located within the 
TM6 region of a particular GPCR, and the 16 th amino acid therefrom is identified for mutation to 
constitutively activate the receptor, it is also possible to mutate the endogenous proline residue 
1 5 (i.e., once the marker is located and the 1 6 th amino acid to be mutated is identified, one may mutate 
the marker itself), although it is most preferred that the proline residue not be mutated. Similarly, 
and while it is most preferred that AA !5 be maintained in their endogenous forms, these amino 
acids may also be mutated. The only amino acid that must be mutated in the non-endogenous 
version of the human GPCR is X i.e., the endogenous amino acid that is 16 residues from P 1 
20 cannot be maintained in its endogenous form and must be mutated, as further disclosed herein. 
Stated again, while it is preferred that in the non-endogenous version of the human GPCR, P 1 and 
AA, 5 remain in their endogenous forms (i.e., identical to their wild-type forms), once X is 
identified and mutated, any and/or all of P 1 and AA] 5 can be mutated. This applies to the nucleic 
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acid sequences as well. In those cases where the endogenous amino acid at position X is lysine, 
then in the non-endogenous version of such GPCR, X is an amino acid other than lysine, 
preferably alanine. 

Accordingly, and as a hypothetical example, if the endogenous GPCR has the following 
5 endogenous amino acid sequence at the above-noted positions: 

P-AACCTTGGRRRDDDE -Q 
then any of the following exemplary and hypothetical cassettes would fall within the scope of 
the disclosure (non-endogenous amino acids are set forth in bold): 

P-AACCTTGGRRRDDDE -K 
1 0 P- AACCTTHIGRRDDDE -K 

P-ADEETTGGRRRDDDE -A 
P-LLKFMSTWZLVAAPQ -K 
A-LLKFMSTWZLVAAPQ -K 
It is also possible to add amino acid residues within AA I5 , but such an approach is not particularly 
1 5 advanced. Indeed, in the most preferred embodiments, the only amino acid that differs in the non- 
endogenous version of the human GPCR as compared with the endogenous version of that GPCR 
is the amino acid in position X; mutation of this amino acid itself leads to constitutive activation 
of the receptor. 

Thus, in particularly preferred embodiments, P 1 and P"* 10 " are endogenous proline and an 
20 endogenous nucleic acid encoding region encoding proline, respectively; and X and X codon are non- 
endogenous lysine or alanine and a non-endogenous nucleic acid encoding region encoding lysine 
or alanine, respectively, with lysine being most preferred. Because it is most preferred that the 
non-endogenous versions of the human GPCRs which incorporate these mutations are 
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incorporated into mammalian cells and utilized for the screening of candidate compounds, the non- 
endogenous human GPCR incorporating the mutation need not be purified and isolated perse (i.e., 
these are incorporated within the cellular membrane of a mammalian cell), although such purified 
and isolated non-endogenous human GPCRs are well within the purview of this disclosure. Gene- 
5 targeted and transgenic non-human mammals (preferably rats and mice) incorporating the non- 
endogenous human GPCRs are also within the purview of this invention; in particular, gene- 
targeted mammals are most preferred in that these animals will incorporate the non-endogenous 
versions of the human GPCRs in place of the non-human mammal's endogenous GPCR-encoding 
region (techniques for generating such non-human mammals to replace the non-human mammal's 
10 protein encoding region with a human encoding region are well known; see, for example, U.S. 
Patent No. 5,777,194.) 

It has been discovered that these changes to an endogenous human GPCR render the 
GPCR constitutively active such that, as will be further disclosed herein, the non-endogenous, 
constitutively activated version of the human GPCR can be utilized for, inter alia, the direct 
15 screening of candidate compounds without the need for the endogenous ligand. Thus, methods 
for using these materials, and products identified by these methods are also within the purview of 
the following disclosure. 

BRIEF DESCRIPTION OF THE DRAWINGS 
Figure 1 shows a generalized structure of a G protein-coupled receptor with the numbers 
20 assigned to the transmembrane helixes, the intracellular loops, and the extracellular loops. 

Figure 2 schematically shows the two states, active and inactive, for a typical G 
protein coupled receptor and the linkage of the active state to the second messenger 
transduction pathway. 
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Figure 3 is a sequence diagram of the preferred vector pCMV, including restriction 
enzymen site locations. 

Figure 4 is a diagrammatic representation of the signal measured comparing pCMV, non- 
endogenous, constitutively active GPR30 inhibition of GPR6-mediated activation of CRE-Luc 
5 reporter with endogenous GPR30 inhibition of GPR6-mediated activation of CRE-Luc 
reporter. 

Figure 5 is a diagrammatic representation of the signal measured comparing pCMV, non- 
endogenous, constitutively activated GPR1 7 inhibition of GPR3-mediated activation of CRE- 
Luc reporter with endogenous GPR1 7 inhibition of GPR3-mediated activation of CRE-Luc 
10 reporter. 

Figure 6 provides diagrammatic results of the signal measured comparing control 
pCMV, endogenous APJ and non-endogenous APJ. 

Figure 7 provides an illustration of IP 3 production from non-endogenous human 5- 
HT 2A receptor as compared to the endogenous version of this receptor. 
15 Figure 8 are dot-blot format results for GPR1 (8A), GPR30 (8B) and APJ (8C). 

DETAILED DESCRIPTION 

The scientific literature that has evolved around receptors has adopted a number of terms 
to refer to ligands having various effects on receptors. For clarity and consistency, the following 
definitions will be used throughout this patent document. To the extent that these definitions 
20 conflict with other definitions for these terms, the following definitions shall control: 

AGONISTS shall mean compounds that activate the intracellular response when they bind 
to the receptor, or enhance GTP binding to membranes. 
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AMINO ACID ABBREVIATIONS used herein are set below: 



ALANINE 


ALA 


A 


ARGENINE 


ARG 


R 


ASPARAGINE 


ASN 


N 


ASPARTIC ACID 


ASP 


D 


CYSTEINE 


CYS 


C 


GLUTAMIC ACID 


GLU 


E 


GLUTAMINE 


GLN 


Q 


GLYCINE 


GLY 


G 


HISTEDINE 


HIS 


H 


ISOLEUCINE 


ILE 


I 


LEUCINE 


LEU 


L 


LYSINE 


LYS 


K 


METHIONINE 


MET 


M 


PHENYLALANINE 


PHE 


F 


PROLINE 


PRO 


P 


SERINE 


SER 


S 


THREONINE 


THR 


T 


TRYPTOPHAN 


TRP 


W 


TYROSINE 


TYR 


Y 


VALINE 


VAL 


V 



PARTIAL AGONISTS shall mean compounds which activate the intracellular response 
when they bind to the receptor to a lesser degree/extent than do agonists, or enhance GTP binding 
to membranes to a lesser degree/extent than do agonists 

25 ANTAGONIST shall mean compounds that competitively bind to the receptor at the 

same site as the agonists but which do not activate the intracellular response initiated by the active 
form of the receptor, and can thereby inhibit the intracellular responses by agonists or partial 
agonists. ANTAGONISTS do not diminish the baseline intracellular response in the absence of 
an agonist or partial agonist. 

30 CANDIDATE COMPOUND shall mean a molecule (for example, and not limitation, 

a chemical compound) which is amenable to a screening technique. Preferably, the phrase 
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"candidate compound" does not include compounds which were publicly known to be compounds 
selected from the group consisting of inverse agonist, agonist or antagonist to a receptor, as 
previously determined by an indirect identification process ("indirectly identified compound"); 
more preferably, not including an indirectly identified compound which has previously been 
5 determined to have therapeutic efficacy in at least one mammal; and, most preferably, not 
including an indirectly identified compound which has previously been determined to have 
therapeutic utility in humans. 

CODON shall mean a grouping of three nucleotides (or equivalents to nucleotides) which 
generally comprise a nucleoside (adenosine (A), guanosine (G), cytidtne (C), uridine (U) and 
1 0 thymidine (T)) coupled to a phosphate group and which, when translated, encodes an amino acid. 

COMPOUND EFFICACY shall mean a measurement of the ability of a compound to 
inhibit or stimulate receptor functionality, as opposed to receptor binding affinity. A preferred 
means of detecting compound efficacy is via measurement of, e.g., [ 35 S]GTPyS binding, as further 
disclosed in the Example section of this patent document. 
15 CONSTITUTIVELY ACTIVATED RECEPTOR shall mean a receptor subject to 

constitutive receptor activation. In accordance with the invention disclosed herein, a non- 
endogenous, human constitutively activated G protein-coupled receptor is one that has been 
mutated to include the amino acid cassette P 1 AA I5 X, as set forth in greater detail below. 

CONSTITUTIVE RECEPTOR ACTIVATION shall mean stabilization of a receptor 
20 in the active state by means other than binding of the receptor with its endogenous ligand or a 
chemical equivalent thereof. Preferably, a G protein-coupled receptor subjected to constitutive 
receptor activation in accordance with the invention disclosed herein evidences at least a 10% 
difference in response (increase or decrease, as the case may be) to the signal measured for 
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constitutive activation as compared with the endogenous form of that GPCR, more preferably, 
about a25% difference in such comparative response, and most preferably about a 50% difference 
in such comparative response. When used for the purposes of directly identifying candidate 
compounds, it is most preferred that the signal difference be at least about 50% such that there is 
5 a sufficient difference between the endogenous signal and the non-endogenous signal to 
differentiate between selected candidate compounds. In most instances, the "difference" will be 
an increase in signal; however, with respect to Gs-coupled GPCRS, the "difference" measured is 
preferably a decrease, as will be set forth in greater detail below. 

CONTACT or CONTACTING shall mean bringing at least two moieties together, 
1 0 whether in an in vitro system or an in vivo system. 

DIRECTLY IDENTIFYING or DIRECTLY IDENTIFIED, in relationship to the 
phrase "candidate compound", shall mean the screening of a candidate compound against a 
constitutively activated G protein-coupled receptor, and assessing the compound efficacy of such 
compound. This phrase is, under no circumstances, to be interpreted or understood to be 
1 5 encompassed by or to encompass the phrase "indirectly identifying" or "indirectly identified." 

ENDOGENOUS shall mean a material that is naturally produced by the genome of the 
species. ENDOGENOUS in reference to, for example and not limitation, GPCR, shall mean that 
which is naturally produced by a human, an insect, a plant, a bacterium, or a virus. By contrast, 
the term NON-ENDOGENOUS in this context shall mean that which is not naturally produced 
20 by the genome of a species. For example, and not limitation, a receptor which is not 
constitutively active in its endogenous foim, but when mutated by using the cassettes disclosed 
herein and thereafter becomes constitutively active, is most preferably referred to herein as a "non- 
endogenous, constitutively activated receptor." Both terms can be utilized to describe both "in 
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vivo ,! and "in vitro" systems. For example, and not limitation, in a screening approach, the 
endogenous or non-endogenous receptor may be in reference to an in vitro screening system 
whereby the receptor is expressed on the cell-surface of a mammalian cell. As a further example 
and not limitation, where the genome of a mammal has been manipulated to include a non- 
5 endogenous constitutively activated receptor, screening of a candidate compound by means of an 
in vivo system is viable. 

HOST CELL shall mean a cell capable of having a Plasmid and/or Vector incorporated 
therein. In the case of a prokaryotic Host Cell, a Plasmid is typically replicated as an autonomous 
molecule as the Host Cell replicates (generally, the Plasmid is thereafter isolated for introduction 
1 0 into a eukaryotic Host Cell); in the case of a eukaryotic Host Cell, a Plasmid is integrated into the 
cellular DNA of the Host Cell such that when the eukaryotic Host Cell replicates, the Plasmid 
replicates. Preferably, for the purposes of the invention disclosed herein, the Host Cell is 
eukaryotic, more preferably, mammalian, and most preferably selected from the group consisting 
of 293, 293T and COS-7 cells. 
1 5 INDIRECTLY IDENTIFYING or INDIRECTLY IDENTIFIED means the traditional 

approach to the drug discovery process involving identification of an endogenous ligand specific 
for an endogenous receptor, screening of candidate compounds against the receptor for 
determination of those which interfere and/or compete with the ligand-receptor interaction, and 
assessing the efficacy of the compound for affecting at least one second messenger pathway 
20 associated with the activated receptor. 

INHIBIT or INHIBITING, in relationship to the term "response" shall mean that a 
response is decreased or prevented in the presence of a compound as opposed to in the absence of 
the compound. 
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INVERSE AGONISTS shall mean compounds which bind to either the endogenous form 
of the receptor or to the constitutively activated form of the receptor, and which inhibit the 
baseline intracellular response initiated by the active form of the receptor below the normal base 
level of activity which is observed in the absence of agonists or partial agonists, or decrease GTP 
5 binding to membranes. Preferably, the baseline intracellular response is inhibited in the presence 
of the inverse agonist by at least 30%, more preferably by at least 50%, and most preferably by at 
least 75%, as compared with the baseline response in the absence of the inverse agonist. 

KNOWN RECEPTOR shall mean an endogenous receptor for which the endogenous 
ligand specific for that receptor has been identified. 
10 LIGAND shall mean an endogenous, naturally occurring molecule specific for an 

endogenous, naturally occurring receptor. 

MUTANT or MUTATION in reference to an endogenous receptor's nucleic acid and/or 
amino acid sequence shall mean a specified change or changes to such endogenous sequences such 
that a mutated form of an endogenous, non-constituti vely activated receptor evidences constitutive 
15 activation of the receptor. In terms of equivalents to specific sequences, a subsequent mutated 
form of a human receptor is considered to be equivalent to a first mutation of the human receptor 
if (a) the level of constitutive activation of the subsequent mutated form of the receptor is 
substantially the same as that evidenced by the first mutation of the receptor; and (b) the percent 
sequence (amino acid and/or nucleic acid) homology between the subsequent mutated form of the 
20 receptor and the first mutation of the receptor is at least about 80%, more preferably at least about 
90% and most preferably at least 95%. Ideally, and owing to the fact that the most preferred 
cassettes disclosed herein for achieving constitutive activation includes a single amino acid and/or 
codon change between the endogenous and the non-endogenous forms of the GPCR (i.e. X or 
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XcodonX the percent sequence homology should be at least 98%. 

ORPHAN RECEPTOR shall mean an endogenous receptor for which the endogenous 
ligand specific for that receptor has not been identified or is not known. 

PHARMACEUTICAL COMPOSITION shall mean a composition comprising at least 
5 one active ingredient, whereby the composition is amenable to investigation for a specified, 
efficacious outcome in a mammal (for example, and not limitation, a human). Those of ordinary 
skill in the art will understand and appreciate the techniques appropriate for determining whether 
an active ingredient has a desired efficacious outcome based upon the needs of the artisan. 

PLASMID shall mean the combination of a Vector and cDNA. Generally, a Plasmid is 
1 0 introduced into a Host Cell for the purpose of replication and/or expression of the cDNA as a 
protein. 

STIMULATE or STIMULATING, in relationship to the term "response" shall mean that 
a response is increased in the presence of a compound as opposed to in the absence of the 
compound. 

15 TRANSVERSE or TRANSVERSING, in reference to either a defined nucleic acid 

sequence or a defined amino acid sequence, shall mean that the sequence is located within at least 
two different and defined regions. For example, in an amino acid sequence that is 10 amino acid 
moieties in length, where 3 of the 1 0 moieties are in the TM6 region of a GPCR and the remaining 
7 moieties are in the IC3 region of the GPCR, the 10 amino acid moiety can be described as 

20 transversing the TM6 and IC3 regions of the GPCR. 

VECTOR in reference to cDNA shall mean a circular DNA capable of incorporating at 
least one cDNA and capable of incorporation into a Host Cell. 

The order of the following sections is set forth for presentational efficiency and is not 
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intended, nor should be construed, as a limitation on the disclosure or the claims to follow. 
A. Introduction 

The traditional study of receptors has always proceeded from the a priori assumption 
(historically based) that the endogenous ligand must first be identified before discovery could 
5 proceed to find antagonists and other molecules that could affect the receptor. Even in cases 
where an antagonist might have been known first, the search immediately extended to looking for 
the endogenous ligand. This mode of thinking has persisted in receptor research even after the 
discovery of constitutively activated receptors. What has not been heretofore recognized is that 
it is the active state of the receptor that is most useful for discovering agonists, partial agonists, and 

1 0 inverse agonists of the receptor. For those diseases which result from an overly active receptor or 
an under-active receptor, what is desired in a therapeutic drug is a compound which acts to 
diminish the active state of a receptor or enhance the activity of the receptor, respectively, not 
necessarily a drug which is an antagonist to the endogenous ligand. This is because a compound 
that reduces or enhances the activity of the active receptor state need not bind at the same site as 

1 5 the endogenous ligand. Thus, as taught by a method of this invention, any search for therapeutic 
compounds should start by screening compounds against the ligand-independent active state. 

Screening candidate compounds against non-endogenous, constitutively activated GPCRs 
allows for the direct identification of candidate compounds which act at these cell surface 
receptors, without requiring any prior knowledge or use of the receptor's endogenous ligand. By 
20 determining areas within the body where the endogenous version of such GPCRs are expressed 
and/or over-expressed, it is possible to determine related disease/disorder states which are 
associated with the expression and/or over-expression of these receptors; such an approach is 
disclosed in this patent document. 
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B. Disease/Disorder Identification and/or Selection 

Most preferably, inverse agonists to the non-endogenous, constitutively activated GPCRs 
can be identified using the materials of this invention. Such inverse agonists are ideal candidates 
as lead compounds in drug discovery programs for treating diseases related to these receptors. 
5 Because of the ability to directly identify inverse agonists, partial agonists or agonists to these 
receptors, thereby allowing for the development of pharmaceutical compositions, a search, for 
diseases and disorders associated with these receptors is possible. For example, scanning both 
diseased and normal tissue samples for the presence of these receptor now becomes more than an 
academic exercise or one which might be pursued along the path of identifying, in the case of an 
1 0 orphan receptor, an endogenous ligand. Tissue scans can be conducted across a broad range of 
healthy and diseased tissues. Such tissue scans provide a preferred first step in associating a 
specific receptor with a disease and/or disorder. 

Preferably, the DNA sequence of the endogenous GPCR is used to make a probe for either 
radiolabeled cDNA or RT-PCR identification of the expression of the GPCR in tissue samples. 
15 The presence of a receptor in a diseased tissue, or the presence of the receptor at elevated or 
decreased concentrations in diseased tissue compared to a normal tissue, can be preferably utilized 
to identify a correlation with that disease. Receptors can equally well be localized to regions of 
organs by this technique. Based on the known functions of the specific tissues to which the 
receptor is localized, the putative functional role of the receptor can be deduced. 

20 C. A "Human GPCR Proline Marker" Algorithm and the Creation of 
Non-Endogenous, Constitutively- Active Human GPCRs 

Among the many challenges facing the biotechnology arts is the unpredictability in 
gleaning genetic information from one species and correlating that information to another species 
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- nowhere in this art does this problem evidence more annoying exacerbation than in the genetic 
sequences that encode nucleic acids and proteins. Thus, for consistency and because of the highly 
unpredictable nature of this art, the following invention is limited, in terms of mammals, to human 
GPCRs - applicability of this invention to other mammalian species, while a potential possibility, 
5 is considered beyond mere rote application. 

In general, when attempting to apply common "rules" from one related protein sequence 
to another or from one species to another, the art has typically resorted to sequence alignment, /. e. , 
sequences are linearized and attempts are then made to find regions of commonality between two 
or more sequences. While useful, this approach does not always prove to result in meaningful 
1 0 information. In the case of GPCRs, while the general structural motif is identical for all GPCRs, 
the variations in lengths of the TMs, ECs and ICs make such alignment approaches from one 
GPCR to another difficult at best. Thus, while it may be desirable to apply a consistent approach 
to, e.g., constitutive activation from one GPCR to another, because of the great diversity in 
sequence length, fidelity, etc from one GPCR to the next, a generally applicable, and readily 
15 successful mutational alignment approach is in essence not possible. In an analogy, such an 
approach is akin to having a traveler start a journey at point A by giving the traveler dozens of 
different maps to point B, without any scale or distance markers on any of the maps, and then 
asking the traveler to find the shortest and most efficient route to destination B only by using the 
maps. In such a situation, the task can be readily simplified by having (a) a common "place- 
20 marker" on each map, and (b) the ability to measure the distance from the place-marker to 
destination B - this, then, will allow the traveler to select the most efficient from starting-point A 
to destination B. 

In essence, a feature of the invention is to provide such coordinates within human GPCRs 
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that readily allows for creation of a constitutively active form of the human GPCRs. 

As those in the art appreciate, the transmembrane region of a cell is highly hydrophobic; 
thus, using standard hydrophobicity plotting techniques, those in the art are readily able to 
determine the TM regions of a GPCR, and specifically TM6 (this same approach is also 
5 applicable to determining the EC and IC regions of the GPCR). It has been discovered that within 
theTM6 region of human GPCRs, a common proline residue (generally near the middle of TM6) 
acts as a constitutive activation "marker." By counting 15 amino acids from the proline marker, 
the 16 th amino acid (which is located in the IC3 loop), when mutated from its endogenous form 
to a non-endogenous form, leads to constitutive activation of the receptor. For convenience, we 
10 refer to this as the "Human GPCR Proline Marker" Algorithm. Although the non-endogenous 
amino acid at this position can be any of the amino acids, most preferably, the non-endogenous 
amino acid is lysine. While not wishing to be bound by any theory, we believe that this position 
itself is unique and that the mutation at this location impacts the receptor to allow for constitutive 
activation. 

1 5 We note that, for example, when the endogenous amino acid at the 1 6 th position is already 

lysine (as is the case with GPR4 and GPR32), then in order for X to be a non-endogenous amino 
acid, it must be other than lysine; thus, in those situations where the endogenous GPCR has an 
endogenous lysine residue at the 16 th position, the non-endogenous version of that GPCR 
preferably incorporates an amino acid other than lysine, preferably alanine, histidine and arginine, 

20 at this posiuon. Of further note, it has been determined that GPR4 appears to be linked to Gs and 
active in its endogenous form (data not shown). 

Because there are only 20 naturally occurring amino acids (although the use of non- 
naturally occurring amino acids is also viable), selection of a particular non-endogenous amino 
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acid for substitution at this 16 th position is viable and allows for efficient selection of a non- 
endogenous amino acid that fits the needs of the investigator. However, as noted, the more 
preferred non-endogenous amino acids at the 1 6 th position are lysine, hisitidine, arginine and 
alanine, with lysine being most preferred. Those of ordinary skill in the art are credited with the 
ability to readily determine proficient methods for changing the sequence of a codon to achieve 
a desired mutation. 

It has also been discovered that occasionally, but not always, the proline residue marker 
will be preceded in TM6 by W2 (i.e., W2P i AA, 5 X) where W is tryptophan and 2 is any amino 
acid residue. 

Our discovery, amongst other things, negates the need for unpredictable and complicated 
sequence alignment approaches commonly used by the art. Indeed, the strength of our discovery, 
while an algorithm in nature, is that it can be applied in a facile manner to human GPCRs, with 
dexterous simplicity by those in the art, to achieve a unique and highly useful end-product, i.e., a 
constitutively activated version of a human GPCR. Because many years and significant amounts 
of money will be required to determine the endogenous ligands for the human GPCRs that the 
Human Genome project is uncovering, the disclosed invention not only reduces the time necessary 
to positively exploit this sequence information, but at significant cost-savings. This approach truly 
validates the importance of the Human Genome Project because it allows for the utilization of 
genetic information to not only understand the role of the GPCRs in, e.g., diseases, but also 
provides the opportunity to improve the human condition. 
D. Screening or Candidate Compounds 

1- Generic GPCR screening assay techniques 

When a G protein receptor becomes constitutively active, it couples to a G protein (e.g., 
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Gq, Gs, Gi, Go) and stimulates release and subsequent binding of GTP to the G protein. The G 
protein then acts as a GTPase and slowly hydrolyzes the GTP to GDP, whereby the receptor, 
under normal conditions, becomes deactivated. However, constitutively activated receptors, 
including the non-endogenous, human constitutively active GPCRs of the present invention, 
5 continue to exchange GDP for GTP. A non-hydrolyzable analog of GTP, [ 35 S]GTPyS, can be 
used to monitor enhanced binding to G proteins present on membranes which express 
constitutively activated receptors. It is reported that [ 35 S]GTPyS can be used to monitor G protein 
coupling to membranes in the absence and presence of ligand. An example of this monitoring, 
among other examples well-known and available to those in the art, was reported by Traynor and 
10 Nahorski in 1995. The preferred use of this assay system is for initial screening of candidate 
compounds because the system is generically applicable to all G protein-coupled receptors 
regardless of the particular G protein that interacts with the intracellular domain of the receptor. 



B 2, Specific GPCR screening assay techniques 

C Once candidate compounds are identified using the "generic" G protein- 
15 -coupled receptor assay {i.e., an assay to select compounds that are agonists, partial 
agonists, or inverse agonists), further screening to confirm that the compounds have 
interacted at the receptor site is preferred. For example, a compound identified by the 
"generic" assay may not bind to the receptor, but may instead merely "uncouple" the G 
protein from the intracellular domain. 

20 a. Gs and Gi. 

Gs stimulates the enzyme adenylyl cyclase. Gi (and Go), on the other hand, 
inhibit this enzyme. Adenylyl cyclase catalyzes the conversion of ATP to cAMP; thus, 

BNSDOCID: <WO 00221 29A 1J_> 



WO 00/22 1 29 PCT/US99/23938 



-25- 

constitutively activated GPCRs that couple the Gs protein are associated with increased 
cellular levels of cAMP. On the other hand, constitutively activated GPCRs that couple the 
Gi (or Go) protein are associated with decreased cellular levels of cAMP. See, generally, 
"Indirect Mechanisms of Synaptic Transmission," Chpt. 8, From Neuron To Brain (3 rd Ed.) 
5 Nichols, J.G. et al eds. Sinauer Associates,, Inc. (1992). Thus, assays that detect cAMP can 
be utilized to determine if a candidate compound is, e.g., an inverse agonist to the receptor 
(i.e., such a compound would decrease the levels of cAMP). A variety of approaches known 
in the art for measuring cAMP can be utilized; a most preferred approach relies upon the use 
of anti-cAMP antibodies in an ELISA-based format. Another type of assay that can be 
10 utilized is a whole cell second messenger reporter system assay. Promoters on genes drive the 
expression of the proteins that a particular gene encodes. Cyclic AMP drives gene expression by 
promoting the binding of a cAMP-responsive DNA binding protein or transcription factor (CREB) 
which then binds to the promoter at specific sites called cAMP response elements and drives the 
expression of the gene. Reporter systems can be constructed which have a promoter containing 
15 multiple cAMP response elements before the reporter gene, e.g., (3-galactosidase or luciferase. 
Thus, a constitutively activated Gs-linked receptor causes the accumulation of cAMP that then 
activates the gene and expression of the reporter protein. The reporter protein such as p- 
galactosidase or luciferase can then be detected using standard biochemical assays (Chen et al. 
1995). With respect to GPCRs that link to Gi (or Go), and thus decrease levels of cAMP, an 
20 approach to the screening of, e.g., inverse agonists, based upon utilization of receptors that link to 
Gs (and thus increase levels of cAMP) is disclosed in the Example section with respect to GPR1 7 
and GPR30. 
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b. Go and Gq. 

Gq and Go are associated with activation of the enzyme phospholipase C, 
which in turn hydrolyzes the phospholipid PIP 2 , releasing two intracellular messengers: 
diacycloglycerol (DAG) and inistol 1,4,5-triphoisphate (IP 3 ). Increased accumulation of IP 3 
5 is associated with activation of Gq- and Go-associated receptors. See, generally, "Indirect 
Mechanisms of Synaptic Transmission," Chpt. 8, From Neuron To Brain (3 rd Ed.) Nichols, 
J.G. et al eds. Sinauer Associates, Inc. (1992). Assays that detect IP 3 accumulation can be 
utilized to determine if a candidate compound is, e.g., an inverse agonist to a Gq- or Go- 
associated receptor (i.e., such a compound would decrease the levels of IP 3 ). Gq-associated 

10 receptors can also been examined using an API reporter assay in that Gq-dependent 
phospholipase C causes activation of genes containing API elements; thus, activated Gq- 
associated receptors will evidence an increase in the expression of such genes, whereby 
inverse agonists thereto will evidence a decrease in such expression, and agonists will 
evidence an increase in such expression. Commercially available assays for such detection 

15 are available. 



E. Medicinal Chemistry 

Generally, but not always, direct identification of candidate compounds is preferably 
conducted in conjunction with compounds generated via combinatorial chemistry techniques, 
whereby thousands of compounds are randomly prepared for such analysis. Generally, the 
20 results of such screening will be compounds having unique core structures; thereafter, these 
compounds are preferably subjected to additional chemical modification around a preferred 
core structure(s) to further enhance the medicinal properties thereof. Such techniques are 
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known to those in the art and will not be addressed in detail in this patent document. 

F. Pharmaceutical Compositions 

Candidate compounds selected for further development can be formulated into 
pharmaceutical compositions using techniques well known to those in the art. Suitable 
5 pharmaceutically-acceptable carriers are available to those in the art; for example, see Remington 's 
Pharmaceutical Sciences, 16 th Edition, 1980, Mack Publishing Co., (Oslo et al., eds.) 

G. Other Utility 

Although a preferred use of the non-endogenous versions of the disclosed human GPCRs 
is for the direct identification of candidate compounds as inverse agonists, agonists or partial 

10 agonists (preferably for use as pharmaceutical agents), these receptors can also be utilized in 
research settings. For example, in vitro and in vivo systems incorporating these receptors can be 
utilized to further elucidate and understand the roles of the receptors in the human condition, both 
normal and diseased, as well understanding the role of constitutive activation as it applies to 
understanding the signaling cascade. A value in these non-endogenous receptors is that their 

15 utility as a research tool is enhanced in that, because of their unique features, the disclosed 
receptors can be used to understand the role of a particular receptor in the human body before the 
endogenous ligand therefor is identified. Other uses of the disclosed receptors will become 
apparent to those in the art based upon, inter alia, a review of this patent document. 

EXAMPLES 

20 The following examples are presented for purposes of elucidation, and not limitation, 

of the present invention. Following the teaching of this patent document that a mutational 
cassette may be utilized in the IC3 loop of human GPCRs based upon a position relative to 
a proline residue in TM6 to constitutively activate the receptor, and while specific nucleic acid 
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and amino acid sequences are disclosed herein, those of ordinary skill in the art are credited 

with the ability to make minor modifications to these sequences while achieving the same or 

substantially similar results reported below. Particular approaches to sequence mutations are 

within the purview of the artisan based upon the particular needs of the artisan. 

5 Example 1 

Preparation of Endogenous Human GPCRs 

A variety of GPCRs were utilized in the Examples to follow. Some endogenous human 

GPCRs were graciously provided in expression vectors (as acknowledged below) and other 

endogenous human GPCRs were synthesized de novo using publicly-available sequence 

10 information. 

1. GPR1 (GenBank Accession Number: U13666) 

The human cDNA sequence for GPR1 was provided in pRcCMV by Brian 
O'Dowd (University of Toronto). GPR1 cDNA ( 1 .4kB fragment) was excised from the pRcCMV 
vector as a Ndel-Xbal fragment and was subcloned into the Ndel-Xbal site of pCMV vector {see 
1 5 Figure 3). Nucleic acid (SEQ.ID.NO.: 1 ) and amino acid (SEQ.ID.NO.: 2) sequences for human 
GPR1 were thereafter determined and verified. 

2. GPR4 (GenBank Accession Numbers: L36148, U35399, U21051) 
The human cDNA sequence for GPR4 was provided in pRcCMV by Brian 

O'Dowd (University ofToronto). GPR1 cDNA ( 1 .4kB fragment) was excised from the pRcCMV 
20 vector as an Apal(blunted)-Xbal fragment and was subcloned (with most of the 5' untranslated 
region removed) into HindTH(blunted)-XbaI site of pCMV vector. Nucleic acid (SEQ.ID.NO.: 3) 
and amino acid (SEQ.ID.NO.: 4) sequences for human GPR4 were thereafter determined and 
verified. 

3. GPRS (GenBank Accession Number: L36149) 
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The cDNA for human GPRS was generated and cloned into pCMV expression 
vector as follows: PCR was performed using genomic DNA as template and rTth polymerase 
(Perkin Elmer) with the buffer system provided by the manufacturer, 0.25 jaM of each primer, and 
0.2 mM of each of the 4 nucleotides. The cycle condition was 30 cycles of: 94°C for 1 min; 64 °C 
5 for lmin; and 72 °C for 1 .5 min. The 5' PCR primer contained an EcoRI site with the sequence: 
5 '-TATG AATTCAG ATGCTCTAAACGTCCCTGC-3 ' (SEQ.ID.NO.: 5) 
and the 3' primer contained BamHI site with the sequence: 
5'-TCCGGATCCACCTGCACCTGCGCCTGCACC-3' (SEQ.ID.NO.: 6). 
The 1 . 1 kb PCR fragment was digested with EcoRI and BamHI and cloned into EcoRI-BamHI 
1 0 site of PCMV expression vector. Nucleic acid (SEQ.ID.NO.: 7) and amino acid (SEQ.ID.NO.: 
8) sequences for human GPRS were thereafter determined and verified. 

4. GPR7 (GenBank Accession Number: U22491) 
The cDNA for human GPR7 was generated and cloned into pCMV expression 
vector as follows: PCR condition- PCR was performed using genomic DNA as template and rTth 
1 5 polymerase (Perkin Elmer) with the buffer system provided by the manufacturer, 0.25 jiM of each 
primer, and 0.2 mM of each of the 4 nucleotides. The cycle condition was 30 cycles of: 94°C for 
1 min; 62°C for lmin; and 72 °C for lmin and 20 sec. The 5 ' PCR primer contained a Hindm site 
with the sequence: 

5 '-GCAAGCTTGGGGGACGCCAGGTCGCCGGCT-3 ' (SEQ.ID.NO.: 9) 
20 and the 3' primer contained a BamHI site with the sequence: 

5 '-GCGGATCCGGACGCTGGGGG AGTC AGGCTGC-3 * (SEQ.ID.NO.: 10). 

The 1.1 kb PCR fragment was digested with HindlH and BamHI and cloned into HindEQ-BamHI 

site of pCMV expression vector. Nucleic acid (SEQ.ID.NO.: 1 1 ) and amino acid (SEQ.ID.NO.: 
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12) sequences for human GPR7 were thereafter determined and verified. 

5. GPR8 (GenBank Accession Number: U22492) 

The cDNA for human GPR8 was generated and cloned into pCMV expression 
vector as follows: PCR was performed using genomic DNA as template and rTth polymerase 
5 (Perkin Elmer) with the buffer system provided by the manufacturer, 0.25 |_iM of each primer, and 
0.2 mM of each of the 4 nucleotides. The cycle condition was 30 cycles of: 94°C for 1 min; 62°C 
for lmin; and 72 °C for lmin and 20 sec. The 5* PCR primer contained an EcoRI site with the 
sequence: 

5 '-CGG AATTCGTCAACGGTCCC AGCTAC AATG-3 ' (SEQ.ID.NO.: 13). 

10 and the 3' primer contained a BamHI site with the sequence: 

5 '-ATGGATCCC AGGCCCTTC AGCACCGC AATAT-3 ' (SEQ.ID.NO. : 14). 
The 1 . 1 kb PCR fragment was digested with EcoRI and BamHI and cloned into EcoRJ-BamHI 
site of PCMV expression vector. All 4 cDNA clones sequenced contained a possible 
polymorphism involving a change of amino acid 206 from Arg to Gin. Aside from this 

15 difference, nucleic acid (SEQ.ID.NO.: 15) and amino acid (SEQ.ID.NO.: 16) sequences for human 
GPR8 were thereafter determined and verified. 

6. GPR9 (GenBank Accession Number: X95876) 

The cDNA for human GPR9 was generated and cloned into pCMV expression 
vector as follows: PCR was performed using a clone (provided by Brian O'Dowd) as template and 
20 pfu polymerase (Stratagene) with the buffer system provided by the manufacturer supplemented 
with 10% DMSO, 0.25 jxM of each primer, and 0.5 mM of each of the 4 nucleotides. The cycle 
condition was 25 cycles of: 94°C for 1 min* 56°C for 1mm* and 72 °C for 2.5 min. The 5' PCR 
primer contained an EcoRI site with the sequence: 
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5 '-ACGAATTC AGCCATGGTCCTTG AGGTG AGTG ACCACC AAGTGCT AAAT-3 ' 
(SEQ.ID.NO.: 17) 

and the 3' primer contained a BamHI site with the sequence: 
5 '-GAGGATCCTGGAATGCGGGGAAGTCAG-3 ' (SEQ.ED.NO.: 18). 
5 The 1.2 kb PCR fragment was digested with EcoRI and cloned into EcoRI-Smal site of PCMV 
expression vector. Nucleic acid (SEQ.ED.NO.: 19) and amino acid (SEQ.ID.NO.: 20) sequences 
for human GPR9 were thereafter determined and verified. 

7. GPR9-6 (GenBank Accession Number: U45982) 

The cDNA for human GPR9-6 was generated and cloned into pCMV expression 

10 vector as follows: PCR was performed using genomic DNA as template and rTth polymerase 
(Perkin Elmer) with the buffer system provided by the manufacturer, 0.25 \jM of each primer, and 
0.2 mM of each of the 4 nucleotides. The cycle condition was 30 cycles of: 94°C for 1 min; 62°C 
for Imin; and 72 °C for 1 min and 20 sec. The 5' PCR primer was kinased with the sequence: 
5 '-TTAAGCTTGACCTAATGCC ATCTTGTGTCC-3 ' (SEQ.ID.NO.: 21) 

1 5 and the 3' primer contained a BamHI site with the sequence: 

5 '-TTGG ATCCAAAAGAACC ATGCACCTC AGAG-3 ' (SEQ.ED.NO.: 22). 
The 1.2 kb PCR fragment was digested with BamHI and cloned into EcoRV-BamHI site of 
pCMV expression vector. Nucleic acid (SEQ.ID.NO.: 23) and amino acid (SEQ.ED.NO.: 24) 
sequences for human GPR9-6 were thereafter determined and verified. 

20 8. GPR1 0 (GenBank Accession Number: U32672) 

The human cDNA sequence for GPR10 was provided in pRcCMV by Brian 
O'Dowd (University of Toronto). GPR10 cDNA (1.3kB fragment) was excised from the 
pRcCMV vector as an EcoRI-Xbal fragment and was subcloned into EcoRI-Xbal site of pCMV 

BMSDOCID: <WO 0022129A1J_> 



WO 00/22 1 29 PCT/US99/23938 



-32- 

vector. Nucleic acid (SEQ.tD.NO.: 25) and amino acid (SEQ.ID.NO.: 26) sequences for human 
GPR10 were thereafter determined and verified. 

9. GPR15 (GenBank Accession Number: U34806) 

The human cDNA sequence for GPR15 was provided in pCDNA3 by Brian 
5 O'Dowd (University of Toronto). GPR15 cDNA (1.5kB fragment) was excised from the 
pCDNA3 vector as a HindlH-Bam fragment and was subcloned into HindEQ-Bam site of pCMV 
vector. Nucleic acid (SEQ.ID.NO.: 27) and amino acid (SEQ.ID.NO.: 28) sequences for human 
GPR1 5 were thereafter determined and verified. 

10. GPR17 (GenBank Accession Number: Z94154) 

1 0 The cDNA for human GPR1 7 was generated and cloned into pCMV expression 

vector as follows: PCR was performed using genomic DNA as template and rTth polymerase 
(Perkin Elmer) with the buffer system provided by the manufacturer, 0.25 jjM of each primer, and 
0.2 mM of each 4 nucleotides. The cycle condition was 30 cycles of: 94°C for 1 min; 56°C for 
lmin and 72 °C for 1 min and 20 sec. The 5' PCR primer contained an EcoRI site with the 

15 sequence: 

5 '-CTAGAATTCTG ACTCC AGCCAAAGCATGAAT-3 ' (SEQ.ID.NO.: 29)and the 3' primer 
contained a BamHI site with the sequence: 

5 '-GCTGGATCCTAAACAGTCTGCGCTCGGCCT-3 ' (SEQ.ID.NO.: 30). 
The 1.1 kb PCR fragment was digested with EcoRI and BamHI and cloned into EcoRI-BamHI 
20 site of pCMV expression vector. Nucleic acid (SEQ.ID.NO.: 31) and amino acid (SEQ.ID.NO.: 
32) sequences for human GPR1 7 were thereafter determined and verified. 

1 1. GPR18 (GenBank Accession Number: L42324) 

The cDNA for human GPR18 was generated and cloned into pCMV expression 
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vector as follows: PCR was performed using genomic DNA as template and rTth polymerase 
(Perkin Elmer) with the buffer system provided by the manufacturer, 0.25 fiM of each primer, and 
0.2 mM of each of the 4 nucleotides. The cycle condition was 30 cycles of: 94°C for 1 min; 54°C 
for lmin; and 72 °C for lmin and 20 sec. The 5' PCR primer was kinased with the sequence: 
5 5 '-ATAAGATGATCACCCTGAAC AATCAAGAT -3' (SEQ.ID.NO.: 33) 
and the 3 ' primer contained an EcoRI site with the sequence: 
5 ' -TCCG AATTC ATAAC ATTTC ACTGTTT AT ATTGC-3 ' (SEQ.ID.NO.: 34). 
The 1 .0 kb PCR fragment was digested with EcoRi and cloned into blunt-EcoRI site of pCMV 
expression vector. All 8 cDNA clones sequenced contained 4 possible polymorphisms involving 
10 changes of amino acid 12 from Thr to Pro, amino acid 86 from Ala to Glu, amino acid 97 from 
He to Leu and amino acid 310 from Leu to Met. Aside from these changes, nucleic acid 
(SEQ.ID.NO.: 35) and amino acid (SEQ.ID.NO.: 36) sequences for human GPR1 8 were thereafter 
determined and verified. 

12. GPR20 (GenBank Accession Number: U66579) 
15 The cDNA for human GPR20 was generated and cloned into pCMV expression 

vector as follows: PCR was performed using genomic DNA as template and rTth polymerase 
(Perkin Elmer) with the buffer system provided by the manufacturer, 0.25 \xM of each primer, and 
0.2 mM of each of the 4 nucleotides. The cycle condition was 30 cycles of: 94°C for 1 min; 62°C 
for lmin; and 72°Cfor 1 min and 20 sec. The 5* PCR primer was kinased with the sequence: 
20 5 '-CCAAGCTTCCAGGCCTGGGGTGTGCTGG-3 * (SEQ.ID.NO. : 37) 
and the 3 ' primer contained a BamHI site with the sequence: 
5 1 - ATGG ATCCTG ACCTTCGGCCCCTGGC AGA-3 ' (SEQ.ID.NO.: 38). 
The 1.2 kb PCR fragment was digested with BamHI and cloned into EcoRV-BamHI site of 
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PCMV expression vector. Nucleic acid (SEQ.ID.NO.: 39) and amino acid (SEQ.ID.NO.: 40) 
sequences for human GPR20 were thereafter determined and verified. 

13. GPR21 (GenBank Accession Number: U66580) 

The cDN A for human GPR2 1 was generated and cloned into pCMV expression 
5 vector as follows: PCR was performed using genomic DNA as template and rTth polymerase 
(Perkin Elmer) with the buffer system provided by the manufacturer, 0.25 (J.M of each primer, and 
0.2 mM of each of the 4 nucleotides. The cycle condition was 30 cycles of: 94°C for 1 mm; 62°C 
for lmin; and 72 °C for 1 min and 20 sec. The 5' PCR primer was kinased with the sequence: 
5 '-GAGAATTCACTCCTGAGCTCAAG ATGAACT-3 ' (SEQ.ID.NO.: 41) 

10 and the 3' primer contained a BamHI site with the sequence: 

5'-CGGGATCCCCGTAACTGAGCCACTTCAGAT-3 ' (SEQ.ID.NO.: 42). 
The 1 . 1 kb PCR fragment was digested with BamHI and cloned into EcoRV-BamHI site of 
pCMV expression vector. Nucleic acid (SEQ.ID.NO.: 43) and amino acid (SEQ.ID.NO.: 44) 
sequences for human GPR2 1 were thereafter determined and verified. 

15 14. GPR22 (GenBank Accession Number: U66581) 

The cDNA for human GPR22 was generated and cloned into pCMV expression 
vector as follows: PCR was performed using genomic DNA as template and rTth polymerase 
(Perkin Elmer) with the buffer system provided by the manufacturer, 0.25 faM of each primer, and 
0.2 mM of each of the 4 nucleotides. The cycle condition was 30 cycles of: 94°C for 1 min; 50°C 

20 for lmin; and 72 °C for 1 .5 min. The 5' PCR primer was kinased with the sequence: 
5 '-TCCCCCGGGAAAAAAACC AACTGCTCCAAA-3 ' (SEQ.ID.NO.: 45) 
and the 3 ? primer contained a BamHI site with the sequence: 
5'-TAGGATCCATTTGAATGTGGATTTGGTGAAA-3 ' (SEQ.ID.NO.: 46). 
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The 1.38 kb PCR fragment was digested with BamHI and cloned into EcoRV-BamHI site of 
pCMV expression vector. Nucleic acid (SEQ.ID.NO.: 47) and amino acid (SEQ.ID.NO.: 48) 
sequences for human GPR22 were thereafter determined and verified. 

15. GPR24 (GenBank Accession Number: U71092) 

5 The cDNA for human GPR24 was generated and cloned into pCMV expression 

vector as follows: PCR was performed using genomic DNA as template and rTth polymerase 
(Perkin Elmer) with the buffer system provided by the manufacturer, 0.25 jiM of each primer, and 
0.2 mM of each 4 nucleotides. The cycle condition was 30 cycles of: 94°C for 1 mini 56°C for 
lmin; and 72 °C for 1 min and 20 sec. The 5' PCR primer contains a HindTTT site with the 
10 sequence: 

5'-GTGAAGCTTGCCTCTGGTGCCTGCAGGAGG-3' (SEQ.ID.NO.: 49) 
and the 3' primer contains an EcoRl site with the sequence: 
S'-GCAGAATTCCCGGTGGCGTGTTGTGGTGCCC^ 1 (SEQ.ID.NO.: 50). 
The 1.3 kb PCR fragment was digested with HindlU and EcoRl and cloned into HindlD-EcoRJ 
1 5 site of pCMV expression vector. The nucleic acid (SEQ.ID.NO.: 51) and amino acid sequence 
(SEQ.ID.NO.: 52) for human GPR24 were thereafter determined and verified. 

16. GPR30 (GenBank Accession Number: U63917) 

The cDNA for human GPR30 was generated and cloned as follows: the coding 
sequence of GPR30 (1 128bp in length) was amplified from genomic DNA using the primers: 
20 5 '-GGCGG ATCCATGGATGTG ACTTCCC AA-3 * (SEQ.ID.NO.: 53) and 
5 '-GGCGG ATCCCT AC ACGGCACTGCTG AA-3 ' (SEQ.ID.NO.: 54). 

The amplified product was then cloned into a commercially available vector, pCR2. 1 (Invitrogen), 
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using a "TOPO-TA Cloning Kit" (Invitrogen, #K4500-01), following manufacturer instructions. 
The full-length GPR30 insert was liberated by digestion with BamHI, separated from the vector 
by agarose gel electrophoresis, and purified using a Sephaglas Bandprep™ Kit (Pharmacia, # 27- 
9285-01) following manufacturer instructions. The nucleic acid (SEQ.ID.NO.: 55) and amino acid 
5 sequence (SEQ.ED.NO.: 56) for human GPR30 were thereafter determined and verified. 

17. GPR31 (Gen Bank Accession Number: U65402) 

The cDNA for human GPR3 1 was generated and cloned into pCMV expression 
vector as follows: PCR was performed using genomic DNA as template and rTth polymerase 
(Perkin Elmer) with the buffer system provided by the manufacturer, 0.25 of each primer, and 

1 0 0.2 mM of each of the 4 nucleotides. The cycle condition was 30 cycles of: 94°C for 1 min; 58°C 
for lmin; and 72 °C for 2 min. The 5' PCR primer contained an EcoRI site with the sequence: 
5 '-AAGGAATTCACGGCCGGGTGATGCCATTCCC-S * (SEQ.ID.NO.: 57) 
and the 3' primer contained a BamHI site with the sequence: 
5 ' -GGTGG ATCCATAAAC ACGGGCGTTG AGGAC -3' (SEQ.ED.NO.: 58). 

1 5 The 1 .0 kb PCR fragment was digested with EcoRI and BamHI and cloned into EcoRI-BamHJ 
site of pCMV expression vector. Nucleic acid (SEQ.ID.NO.: 59) and amino acid (SEQ.ID.NO.: 
60) sequences for human GPR3 1 were thereafter determined and verified. 

18, GPR32 (Gen Bank Accession Number: AF045764) 

The cDNA for human GPR32 was generated and cloned into pCMV expression 
20 vector as follows: PCR was performed using genomic DNA as template and rTth polymerase 
(Perkin Elmer) with the buffer system provided by the manufacturer, 0.25 yM of each primer, and 
0.2 mM of each 4 nucleotides. The cycle condition was 30 cycles of: 94°C for 1 min; 56°C for 
lmin; and 72 °C for 1 min and 20 sec. The 5' PCR primer contained an EcoRI site with the 
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sequence: 

5 '-TAAGAATTCC ATAAAAATTATGG AATGG-3 1 (SEQ.ID.NO. :243 ) 
and the 3' primer contained a BamHI site with the sequence: 
5 '-CC AGG ATCC AGCTGAAGTCTTCC ATC ATTC-3 ' (SEQ.ID.NO.: 244). 
5 The 1 .1 kb PCR fragment was digested with EcoRI and BamHI and cloned into EcoRI-BamHI 
site of pCMV expression vector. Nucleic acid (SEQ.ID.NO.: 245) and amino acid (SEQ.ID.NO.: 
246) sequences for human GPR32 were thereafter determined and verified. 

19. GPR40 (GenBank Accession Number: AF024687) 
The cDNA for human GPR40 was generated and cloned into pCMV expression 
10 vector as follows: PCR was performed using genomic DNA as template and rTth polymerase 
(Perkin Elmer) with the buffer system provided by the manufacturer, 0.25 fxM of each primer, and 
0.2 mM of each 4 nucleotides. The cycle condition was 30 cycles of: 94°C for 1 min, 65°C for 
lmin and 72 °C for 1 min and 10 sec. The 5' PCR primer contained an EcoRI site with the 
sequence 

1 5 5 '-GCAG AATTCGGCGGCCCC ATGGACCTGCCCCC-3 * (SEQ.ID.NO. : 247) 
and the 3 ' primer contained a BamHI site with the sequence 
5 '-GCTGG ATCCCCCGAGCAGTGGCGTTACTTC-3 * (SEQ.ID.NO.: 248). 
The 1 kb PCR fragment was digested with EcoRI and BamHI and cloned into EcoRI-BamHI site 
of pCMV expression vector. Nucleic acid (SEQ.ID.NO.: 249) and amino acid (SEQ.ID.NO.: 250) 

20 sequences for human GPR40 were thereafter determined and verified. 

20. GPR41 (GenBank Accession Number AF024688) 

The cDNA for human GPR41 was generated and cloned into pCMV expression 
vector as follows: PCR was performed using genomic DNA as template and rTth polymerase 
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(Perkin Elmer) with the buffer system provided by the manufacturer, 0.25 of each primer, and 
0.2 mM of each 4 nucleotides. The cycle condition was 30 cycles of 94°C for 1 min, 65°C for 
lmin and 72 °C for 1 min and 10 sec. The 5' PCR primer contained an HincUII site with the 
sequence: 

5 5 '-CTCAAGCTTACTCTCTCTC ACCAGTGGCCAC-3 ' (SEQ.ID.NO. : 25 1 ) 
and the 3' primer was kinased with the sequence 
5'-CCCTCCTCCCCCGGAGGACCTAGC-3' (SEQ.ID.NO.: 252). 

The 1 kb PCR fragment was digested with HindlH and cloned into HindHI-blunt site of pCMV 
expression vector. Nucleic acid (SEQ.ID.NO.: 253) and amino acid (SEQ.ID.NO.: 254) 
10 sequences for human GPR41 were thereafter determined and verified. 

21. GPR43 (GenBank Accession Number AF024690) 

The cDNA for human GPR43 was generated and cloned into pCMV expression 
vector as follows: PCR was performed using genomic DNA as template and rTth polymerase 
(Perkin Elmer) with the buffer system provided by the manufacturer, 0.25 uM of each primer, and 
15 0.2 mM of each 4 nucleotides. The cycle condition was 30 cycles of: 94°C for 1 min; 65°C for 
lmin; and 72 °C for 1 min and 10 sec. The 5' PCR primer contains an Hindin site with the 
sequence: 

S'-TTTAAGCTTCCCCTCCAGGATGCTGCCGGAC-S' (SEQ.ID.NO.: 255) 
and the 3' primer contained an EcoRI site with the sequence: 
20 5 '-GGCGAATTCTGAAGGTCC AGGGAAACTGCTA-3 ' (SEQ.ID.NO. 256). 

The 1 kb PCR fragment was digested with Hindin and EcoRI and cloned into HindJU-EcoRI site 
of pCMV expression vector. Nucleic acid (SEQ.ID.NO.: 257) and amino acid (SEQ.ID.NO.: 258) 
sequences for human GPR43 were thereafter determined and verified. 
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22. APJ (GenBank Accession Number: U03642) 

Human APJ cDNA (in pRcCMV vector) was provided by Brian O'Dowd 
(University of Toronto). The human APJ cDNA was excised from the pRcCMV vector as an 
EcoRl-Xbal (blunted) fragment and was subcloned into EcoRI-Smal site of pCMV vector. 
5 Nucleic acid (SEQ.ID.NO.: 61 ) and amino acid (SEQ.ID.NO.: 62) sequences for human APJ 
were thereafter determined and verified. 

23. BLR1 (GenBank Accession Number: X68149) 

The cDNA for human BLR1 was generated and cloned into pCMV expression 
vector as follows: PCR was performed using thymus cDNA as template and rTth polymerase 
1 0 (Perkin Elmer) with the buffer system provided by the manufacturer, 0.25 \iM of each primer, and 
0.2 mM of each of the 4 nucleotides. The cycle condition was 30 cycles of: 94°C for 1 min; 62°C 
for lmin; and 72 °C for 1 min and 20 sec. The 5' PCR primer contained an EcoRI site with the 
sequence: 

5 ' -TG AG AATTCTGGTG ACTC AC AGCCGGC AC AG-3 ' (SEQ.ID.NO.: 63): 

1 5 and the 3' primer contained a BamHI site with the sequence: 

5'-GCCGGATCCAAGGAAAAGCAGCAATAAAAGG-3' (SEQ.ID.NO.: 64). The 1.2 kb PCR 
fragment was digested with EcoRI and BamHI and cloned into EcoRI-BamHI site of pCMV 
expression vector. Nucleic acid (SEQ.ID.NO.: 65) and amino acid (SEQ.ID.NO.: 66) sequences 
for human BLR1 were thereafter determined and verified. 

20 24. CEPR (GenBank Accession Number: U77827) 

The cDNA for human CEPR was generated and cloned into pCMV expression 
vector as follows: PCR was performed using genomic DNA as template and rTth polymerase 
(Perkin Elmer) with the buffer system provided by the manufacturer, 0.25 (iM of each primer, and 
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0.2 mM of each of the 4 nucleotides. The cycle condition was 30 cycles of: 94°C for 1 min; 65°C 
for lmin; and 72 °C for 1 min and 20 sec. The 5' PCR primer was kinased with the sequence: 
5 ' -C AAAGCTTGAAAGCTGCACGGTGCAGAGAC-3 ' (SEQ. ED.NO. :67) 
and the 3' primer contained a BamHI site with the sequence: 
5 5'-GCGGATCCCGAGTCACACCCTGGCTGGGCC-3' (SEQ.ID.NO.: 68). 

The 1.2 kb PCR fragment was digested with BamHI and cloned into EcoRV-BamHI site of 
pCMV expression vector. Nucleic acid (SEQ.ID.NO.: 69) and amino acid (SEQ.ID.NO.: 70) 
sequences for human CEPR were thereafter determined and verified. 

25* EBI1 (GenBank Accession Number: L31581) 
10 The cDNA for human EBI1 was generated and cloned into pCMV expression 

vector as follows: PCR was performed using thymus cDNA as template and rTth polymerase 
(Perkin Elmer) with the buffer system provided by the manufacturer, 0.25 \jM of each primer, and 
0.2 mM of each of the 4 nucleotides. The cycle condition was 30 cycles of: 94°C for 1 min; 62°C 
for lmin; and 72 °C for 1 min and 20 sec. The 5' PCR primer contained an EcoRI site with the 
15 sequence: 

5-ACAGAATTCCTGTGTGGTTTTACCGCCCAG-3' (SEQ.ID.NO.: 71) 
and the 3' primer contained a BamHI site with the sequence: 
5 '-CTCGGATCCAGGCAGAAGAGTCGCCTATGG-3 ' (SEQ.ID.NO.: 72). 
The 1.2 kb PCR fragment was digested with EcoRI and BamHI and cloned into EcoRI-BamHI 
20 site of PCMV expression vector. Nucleic acid (SEQ.ID.NO.: 73) and amino acid (SEQ.ID.NO.: 
74) sequences for human EBI1 were thereafter determined and verified. 

26. EBI2 (GenBank Accession Number: L08177) 

The cDNA for human EBI2 was generated and cloned into pCMV expression 
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vector as follows: PCR was performed using cDNA clone (graciously provided by Kevin Lynch, 
University of Virginia Health Sciences Center; the vector utilized was not identified by the source) 
as template and pfu polymerase (Stratagene) with the buffer system provided by the manufacturer 
supplemented with 10% DMSO, 0.25 of each primer, and 0.5 mM of each of the 4 
5 nucleotides. The cycle condition was 30 cycles of: 94°C for 1 min; 60°C for Imin; and 72°G for 
1 min and 20 sec. The 5' PCR primer contained an EcoRI site with the sequence: 
5 '-CTGGAATTCACCTGGACCACCACCAATGGATA-3 ' (SEQ.ID.NO.: 75) 
and the 3' primer contained a BamHI site with the sequence 
5'-CTCGGATCCTGCAAAGTTTGTCATACAG TT-3' (SEQ.ID.NO.: 76). 

10 The 1.2 kb PCR fragment was digested with EcoRI and BamHI and cloned into EcoRI-BamHI 
site of pCMV expression vector. Nucleic acid (SEQ.ID.NO.: 77) and amino acid (SEQ.ID.NO.: 
78) sequences for human EBI2 were thereafter determined and verified. 

27, ETBR-LP2 (GenBank Accession Number: D38449) 

The cDNA for human ETBR-LP2 was generated and cloned into pCMV 

15 expression vector as follows: PCR was performed using brain cDNA as template and rTth 
polymerase (Perkin Elmer) with the buffer system provided by the manufacturer, 0.25 |iM of each 
primer, and 0.2 mM of each of the 4 nucleotides. The cycle condition was 30 cycles of: 94°C for 
1 min; 65°C for lmin; and 72 °C for 1.5 min. The 5 1 PCR contained an EcoRI site with the 
sequence: 

20 5 '-CTGG AATTCTCCTGCTCATCC AGCCATGCGG -3' (SEQ.ID.NO.: 79) 
and the 3' primer contained a BamHI site with the sequence: 
S'-CCTGGATCCCCACCCCTACTGGGGCCTCAG -3' (SEQ.ID.NO.: 80). 
The 1 .5 kb PCR fragment was digested with EcoRI and BamHI and cloned into EcoRI-BamHI 



BNSOOCID: <WO_ 0022129A1J_> 



WO 00/22 1 29 PCT/US99/23938 



-42 - 

site of pCMV expression vector. Nucleic acid (SEQ.ID.NO.: 81) and amino acid 
(SEQ.ID.NO.: 82) sequences for human ETBR-LP2 were thereafter determined and verified. 
28. GHSR (GenBank Accession Number: U60179) 

The cDNA for human GHSR was generated and cloned into pCMV expression 
5 vector as follows: PCR was performed using hippocampus cDNA as template and TaqPlus 
Precision polymerase (Stratagene) with the buffer system provided by the manufacturer, 0.25 \iM 
of each primer, and 0.2 mM of each of the 4 nucleotides. The cycle condition was 30 cycles of: 
94°C for 1 min; 68°C for lmin; and 72 °C for 1 min and 10 sec. For first round PCR, the 5' PCR 
primer sequence was: 
10 5 ' - ATGTGG AACGCGACGCCC AGCG-3 * (SEQ.ID.NO.: 83) 
and the 3' primer sequence was: 

5 '-TCATGTATTAATACTAGATTCT-3 ' (SEQ.ID.NO.: 84). 

Two microliters of the first round PCR was used as template for the second round PCR where the 
5' primer was kinased with sequence: 

15 5'-TACCATGTGG AACGCGACGCCC AGCGAAGAGCCGGGGT-3'(SEQ.ID.NO.:85) 
and the 3' primer contained an EcoRI site with the sequence: 

5^CGGAATTCATGTATTAATACTAGATTCTGTCCAGGCCCG-3XSEQ.ID.NO.:86). 

The 1.1 kb PCR fragment was digested with EcoRI and cloned into blunt-EcoRI site of pCMV 

expression vector. Nucleic acid (SEQ.ID.NO.: 87) and amino acid (SEQ.ID.NO.: 88) sequences 
20 for human GHSR were thereafter determined and verified. 

29. GPCR-CNS (GenBank Accession Number: AF017262) 

The cDNA for human GPCR-CNS was generated and cloned into pCMV 

expression vector as follows: PCR was performed using brain cDNA as template and rTth 
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polymerase (Perkin Elmer) with the buffer system provided by the manufacturer, 0.25 fiM of each 
primer, and 0.2 mM of each of the 4 nucleotides. The cycle condition was 30 cycles of: 94°C for 
1 min; 65°C for Imin; and 72 °C for 2 min. The 5' PCR primer contained a HindlD site with the 
sequence: 

5 5 '-GCAAGCTTGTGCCCTCACCAAGCCATGCGAGCC-3 ' (SEQ.ID.NO.: 89) 
and the 3' primer contained an EcoRI site with the sequence: 
5 '-CGGAATTC AGCAATGAGTTCCGACAGAAGC-3 ' (SEQ.ID.NO.: 90). 
The 1 .9 kb PCR fragment was digested with HindlD and EcoRI and cloned into HindHI-EcoRI 
site of pCMV expression vector. All nine clones sequenced contained a potential polymorphism 
1 0 involving a S284C change. Aside from this difference, nucleic acid (SEQ.ID.NO.: 91 ) and amino 
acid (SEQ.ID.NO. : 92) sequences for human GPCR-CNS were thereafter determined and verified. 
30. GPR-NGA (GenBank Accession Number: U55312) 
The cDNA for human GPR-NGA was generated and cloned into pCMV 
expression vector as follows: PCR was performed using genomic DNA as template and rTth 
1 5 polymerase (Perkin Elmer) with the buffer system provided by the manufacturer, 0.25 jiM of each 
primer, and 0.2 mM of each of the 4 nucleotides. The cycle condition was 30 cycles of 94°C for 
1 min, 56°C for lmin and 72 °C for 1.5 min. The 5' PCR primer contained an EcoRI site with the 
sequence: 

5 '-CAGAATTC AG AGAAAAAAAGTG AATATGGTTTTT-3 ' (SEQ.ID.NO.: 93) 
20 and the 3' primer contained a BamHI site with the sequence: 

5 '-TTGG ATCCCTGGTGC ATAAC AATTGAAAG AAT-3 9 (SEQ.ID.NO.: 94). 

The 1 .3 kb PCR fragment was digested with EcoRI and BamHI and cloned into EcoRI-BamHl 

site of pCMV expression vector. Nucleic acid (SEQ.ID.NO.: 95) and amino acid (SEQ.ID.NO.: 
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96) sequences for human GPR-NGA were thereafter determined and verified. 
31. H9 (Gen Bank Accession Number: U52219) 

The cDNA for human HB954 was generated and cloned into pCMV expression 
vector as follows: PCR was performed using pituitary cDNA as template and rTth polymerase 
5 (Perkin Elmer) with the buffer system provided by the manufacturer, 0.25 fiM of each primer, and 
0.2 mM of each 4 nucleotides. The cycle condition was 30 cycles of: 94°C for 1 min, 62°C for 
lmin and 72 °C for 2 min. The 5' PCR primer contains a HindlQ site with the sequence: 
5 '-GGAAAGCTTAACGATCCCCAGGAGC AACAT-3 ' (SEQ.ID.NO.: 97) 
and the 3' primer contains a BamHI site with the sequence: 
1 0 5 , -CTGGGATCCTACGAGAGCATTTTTCACACAG-3 * (SEQ.ID.NO.: 98). 

The 1.9 kb PCR fragment was digested with Hindlll and BamHI and cloned into Hindlll- 
BamHI site of pCMV expression vector. When compared to the published sequences, a 
different isoform with 12 bp in frame insertion in the cytoplasmic tail was also identified and 
designated "H9b." Both iso forms contain two potential polymorphisms involving changes 
15 of amino acid P320S and amino acid G448A. Isoform H9a contained another potential 
polymorphism of amino acid S493N, while isoform H9b contained two additional potential 
polymorphisms involving changes of amino acid I502T and amino acid A532T 
(corresponding to amino acid 528 of isoform H9a). Nucleic acid (SEQ.ID.NO.: 99) and 
amino acid (SEQ.ID.NO.: 100) sequences for human H9 were thereafter determined and 
20 verified (in the section below, both isoforms were mutated in accordance with the Human 
GPCR Proline Marker Algorithm). 

32. HB954 (GenBank Accession Number: D38449) 

The cDNA for human HB954 was generated and cloned into pCMV expression 
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vector as follows: PCR was performed using brain cDN A as template and rTth polymerase (Perkin 
Elmer) with the buffer system provided by the manufacturer, 0.25 \iM of each primer, and 0.2 mM 
of each of the 4 nucleotides. The cycle condition was 30 cycles of 94°C for 1 min, 58°C for lmin 
and 72 °C for 2 min. The 5' PCR contained a Hindlll site with the sequence: 
5 S'-TCCAAGCTTCGCCATGGGACATAACGGGAGCT -3' (SEQ.ID.NO.: 101) 
and the 3' primer contained an EcoRI site with the sequence: 
5 '-CGTGAATTCCAAGAATTTACAATCCTTGCT -3' (SEQ.ID.NO.: 102). 
The 1.6 kb PCR fragment was digested with Hindlll and EcoRI and cloned into Hindlll- 
EcoRI site of pCMV expression vector. Nucleic acid (SEQ.ID.NO.: 103) and amino acid 
10 (SEQ.ID.NO.: 104) sequences for human HB954 were thereafter determined and verified. 
33. HG38 (GenBank Accession Number: AF062006) 

The cDNA for human HG38 was generated and cloned into pCMV expression 
vector as follows: PCR was performed using brain cDN A as template and rTth polymerase (Perkin 
Elmer) with the buffer system provided by the manufacturer, 0.25 jaM of each primer, and 0.2 mM 

15 of each 4 nucleotides. The cycle condition was 30 cycles of 94°C for 1 min, 56°C for lmin and 
72 °C for 1 min and 30 sec. Two PCR reactions were performed to separately obtain the 5' and 
3' fragment. For the 5' fragment, the 5' PCR primer contained an HindHI site with the sequence: 
5'-CCCAAGCTTCGGGCACCATGGACACCTCCC-3' (SEQ.ID.NO.: 259) 
and the 3' primer contained a BamHIsite with the sequence: 

20 5 '-ACAGG ATCCAAATGC ACAGCACTGGTAAGC-3 ' (SEQ.ID.NO.: 260). 

This 5' 1 .5 kb PCR fragment was digested with HindHI and BamHl and cloned into an Hindlll- 
BamHI site of pCMV. For the 3' fragment, the 5' PCR primer was kinased with the sequence: 
5 ' -CT AT AACTGGGTT AC ATGGTTTAAC-3 * (SEQ.ID.NO. 261) 
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and the 3' primer contained an EcoRI site with the sequence: 
5 '-TTTG AATTC ACATATTAATT AGAG AC ATGG-3 ' (SEQ.ID.NO.: 262). 
The 1.4 kb 3' PCR fragment was digested with EcoRI and subcloned into a blunt-EcoRI site of 
pCMV vector. The 5 ' and 3 * fragments were then ligated together through a common EcoRV site 
5 to generate the full length cDNA clone. Nucleic acid (SEQ.ID.NO.: 263) and amino acid 
(SEQ.ID.NO.: 264) sequences for human HG38 were thereafter determined and verified. 

34. HM74 (GenBank Accession Number: D10923) 

The cDNA for human HM74 was generated and cloned into pCMV expression 
vector as follows: PCR was performed using either genomic DNA or thymus cDNA (pooled) as 
10 template and rTth polymerase (Perkin Elmer) with the buffer system provided by the 
manufacturer, 0.25 jaM of each primer, and 0.2 mM of each of the 4 nucleotides. The cycle 
condition was 30 cycles of: 94°C for 1 min; 65°C for lmin; and 72 °C for 1 min and 20 sec. The 
5' PCR primer contained an EcoRI site with the sequence: 
5 '-GGAGAATTC ACTAGGCGAGGCGCTCCATC-3 ' (SEQ.ID.NO.: 105) 
1 5 and the 3' primer was kinased with the sequence: 

5 , -GGAGGATCCAGGAAACCTTAGGCCGAGTCC-3 ' (SEQ.ID.NO.: 1 06). 
The 1 .3 kb PCR fragment was digested with EcoRI and cloned into EcoRI-Smal site of 
pCMV expression vector. Clones sequenced revealed a potential polymorphism involving a 
N94K change. Aside from this difference, nucleic acid (SEQ.ID.NO.: 107) and amino acid 
20 (SEQ.ID.NO.: 108) sequences for human HM74 were thereafter determined and verified. 

35. MIG (GenBank Accession Numbers: AFO44600 and AFO44601) 
The cDNA for human MIG was generated and cloned into pCMV expression 
vector as follows: PCR was performed using genomic DNA as template and TaqPlus Precision 
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polymerase (Stratagene) for first round PCR or pfu polymerase (Stratagene) for second round PCR 
with the buffer system provided by the manufacturer, 0.25 jiM of each primer, and 0.2 mM 
(TaqPlus Precision) or 0.5 mM (pfu) of each of the 4 nucleotides. When pfii was used, 10% 
DMSO was included in the buffer. The cycle condition was 30 cycles of: 94°C for 1 min; 65°C 
5 for lmin; and 72 °C for: (a) 1 min for first round PCR; and (b) 2 min for second round PCR. 
Because there is an intron in the coding region, two sets of primers were separately used to 
generate overlapping 5' and 3* fragments. The 5' fragment PCR primers were: 

5 '-ACC ATGGCTTGC A ATGGC AGTGCGGCC AGGGGGC ACT-3 ' (external sense) 
(SEQ.ID.NO.: 109) 
10 and 

5'-CGACCAGGACAAACAGCATCTTGGTCACTTGTCTCCGGC-3 '(internal antisense) 

(SEQ.ID.NO.: 110). 

The 3' fragment PCR primers were: 

5 , -GACCAAGATGCTGTTTGTCCTGGTCGTGGTGTTTGGCAT-3 , (internal sense) 
15 (SEQ.ID.NO.: Ill) and 

5 '-CGGAATTCAGG ATGGATCGGTCTCTTGCTGCGCCT-3 ' (external antisense with an 
EcoRI site) (SEQ.ID.NO.: 112). 

The 5' and 3' fragments were ligated together by using the first round PCR as template and the 
kinased external sense primer and external antisense primer to perform second round PCR. The 
20 1.2 kb PCR fragment was digested with EcoRI and cloned into the blunt-EcoRI site of pCMV 
expression vector. Nucleic acid (SEQ.ID.NO.: 113) and amino acid (SEQ.ID.NO.: 114) 
sequences for human MIG were thereafter determined and verified. 

36. OGR1 (GenBank Accession Number: U48405) 
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The cDNA for human OGR1 was generated and cloned into pCMV expression 
vector as follows: PCR was performed using genomic DNA as template and rTth polymerase 
(Perkin Elmer) with the buffer system provided by the manufacturer, 0.25 juM of each primer, and 
0.2 mM of each of the 4 nucleotides. The cycle condition was 30 cycles of: 94°C for 1 min; 65°C 
5 for lmin; and 72 °C for 1 min and 20 sec. The 5' PCR primer was kinased with the sequence: 
. 5 '-GGAAGCTTCAGGCCCAAAGATGGGGAACAT^ * (SEQ.ID.NO.: 115): 
and the 3' primer contained a BamHI site with the sequence: 
5 '-GTGGATCCACCCGCGGAGGACCCAGGCTAG -3' (SEQ.ID.NO.: 116). 
The 1 .1 kb PCR fragment was digested with BamHI and cloned into the EcoRV-BamHI site 
1 0 of pCMV expression vector. Nucleic acid (SEQ.ID.NO.: 1 1 7) and amino acid (SEQ.ID.NO.: 
118) sequences for human OGR1 were thereafter determined and verified. 
37. Serotonin 5HT 2A 

The cDNA encoding endogenous human 5HT 2A receptor was obtained by RT-PCR 
using human brain poly-A + RNA; a 5' primer from the 5' untranslated region with an Xho I 

15 restriction site: 

S'-GACCTCGAGTCCTTCTACACCTCATC-S' (SEQ.ID.NO: 119) 

and a 3' primer from the 3' untranslated region containing an Xba I site: 

5 '-TGCTCT AG ATTCC AGATAGGTGAAAACTTG-3 ' (SEQ.ID.NO: 120) 

PCR was performed using either TaqPlus™ precision polymerase (Stratagene) or rTth™ 

20 polymerase (Perkin Elmer) with the buffer system provided by the manufacturers, 0.25 of each 
primer, and 0.2 mM of each of the 4 nucleotides. The cycle condition was 30 cycles of: 94°C for 
1 min; 57 °C for lmin; and 72 °C for 2 min. The 1 .5 kb PCR fragment was digested with Xba 1 
and subcloned into Eco RV-Xba I site of pBluescript. The resulting cDNA clones were fully 
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sequenced and found to encode two amino acid changes from the published sequences. The first 
one was a T25N mutation in the N-terminal extracellular domain; the second is an H452Y 
mutation. Because cDNA clones derived from two independent PCR reactions using Taq 
polymerase from two different commercial sources (TaqPlus™ from Stratagene and rTth™Perkin 
5 Elmer) contained the same two mutations, these mutations are likely to represent sequence 
polymorphisms rather than PCR errors. With these exceptions, the nucleic acid (SEQ.ID.NO.: 
121) and amino acid (SEQ.ID.NO. : 1 22) sequences for human 5HT 2A were thereafter determined 
and verified. 

38. Serotonin 5HT 2C 

10 The cDNA encoding endogenous human 5HT 2C receptor was obtained from 

human brain poly-A + RNA by RT-PCR. The 5' and 3' primers were derived from the 5' and 3' 
untranslated regions and contained the following sequences: 
5'-GACCTCGAGGTTGCTTAAGACTGAAGC-3' (SEQ.ID.NO.: 123) 
5'-ATTTCTAGACATATGTAGCTTGTACCG-3 ' (SEQ.ID.NO.: 124) 

15 Nucleic acid (SEQ.ID.NO.: 125) and amino acid (SEQ.ID.NO.: 126) sequences forhuman5HT 2c 
were thereafter determined and verified. 

39. V28 (GenBank Accession Number: U20350) 

The cDNA for human V28 was generated and cloned into pCMV expression 
vector as follows: PCR was performed using brain cDN A as template and rTth polymerase (Perkin 
20 Elmer) with the buffer system provided by the manufacturer, 0.25 \xM of each primer, and 0.2 rtiM 
of each of the 4 nucleotides. The cycle condition was 30 cycles of: 94°C for 1 min; 65°C for Imin; 
and 72 °C for 1 min and 20 sec. The 5' PCR primer contained a HindlTI site with the sequence: 
5 * -GGT AAGCTTGGC AGTCC ACGCC AGGCCTTC-3 ' (SEQ.ID.NO.: 127) 
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and the 3' primer contained an EcoRI site with the sequence: 
5 '-TCCGAATTCTCTGTAG ACAC AAGGCTTTGG-3 ' (SEQ.ID.NO.: 128) 
The 1.1 kb PCR fragment was digested with Hindm and EcoRI and cloned into HindJU-EcoRI 
site of pCMV expression vector. Nucleic acid (SEQ.ED.NO.: 129) and amino acid (SEQ.ID.NO.: 
5 130) sequences for human V28 were thereafter determined and verified. 



Example 2 

Preparation of Non-Endogenous Human GPCRs 



1. Site-Directed Mutagenesis 

Mutagenesis based upon the Human GPCR Proline Marker approach disclosed herein was 
10 performed on the foregoing endogenous human GPCRs using Transformer Site-Directed 
Mutagenesis Kit (Clontech) according to the manufacturer instructions. For this mutagenesis 
approach, a Mutation Probe and a Selection Marker Probe (unless otherwise indicated, the probe 
of SEQ.ID.NO.: 132 was the same throughout) were utilized, and the sequences of these for the 
specified sequences are listed below in Table B (the parenthetical number is the SEQ. ID.NO.). 
1 5 For convenience, the codon mutation incorporated into the human GPCR is also noted, in standard 
form: 



Table B 





Receptor Identifier 
(Codon Mutation) 


Mutation Probe Sequence 
(5'-3') 
(SEQ.ID.NO.) 


Selection Marker Probe 
Sequence (5'-3') 
(SEQJD.iNO.) 


2( 


i GPR1 
(F245K) 


GATCTCCAGTAGGCATAAGT 

GGACAATTCTGG 

(131) 


CTCCTTCGGTCCTCCTATCGT 

TGTCAGAAG 

(132) 




GPR4 

(K223A) 


AGAAGGCCAAGATCGCGCGG 

CTGGCCCTCA 

(133) 


CTCCTTCGGTCCTCCTATCGT 
TGTCAGAAGT 


2i 


GPRS 
(V224K) 


CGGCGCCACCGCACG AAAAA 
GCTCATCTTC 


CTCCTTCGGTCCTCCTATCGT 
TGTCAGAAGT 
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(134) 




GPR7 
(T250K) 


GCCAAGAAGCGGGTGAAGTT 

CCTGGTGGTGGCA 

(135) 


CTCCTTCG GTCCTCCTATCGT 
TGTCAGAAGT 



GPR8 
(T259K) 


CAGGCGGAAGGTGAAAGTCC 

TGGTCCTCGT 

(136) 


CTCCTTCGGTCCTCCTATCGT 
TGTCAGAAGT 


>GPR9 
(M254K) 


CGGCGCCTGCGGGCC AAGCG 

GCTGGTGGTGGTG 

(137) 


CTCCTTCGGTCCTCCTATCGT 
TGTCAGAAGT 


GPR9-6 
(L241K) 


CCAAGCACAAAGCCAAGAAA 

GTGACCATCAC 

(138) 


CTCCTTCGGTCCTCCTATCGT 
TGTCAGAAGT 



GPR10 


GCGCCGGCGC ACC AAATGCT 


CTCCTTCGGTCCTCCTATCGT 


(F276K) 


TGCTGGTGGT 


TGTCAGAAGT 


(139) 





GPR15 
(I240K) 


CAAAAAGCTGAAGAAATCTA 
AGAAGATCATCTTTATTGTCG 
(140) 


CTCCTTCGGTCCTCCTATCGT 
TGTCAGAAGT 


GPR17 
(V234K) 


C AAG ACC AAGGC A AAA CGCA 

TGATCGCCAT 

(141) 


CTCCTTCGGTCCTCCTATCGT 
TGTCAGAAGT 


► GPR18 
(123 IK) 


GTCAAGGAGAAGTCCAAAAG 

GATCATCATC 

(142) 


CTCCTTCGGTCCTCCTATCGT 
TGTCAGAAGT 


GPR20 

(M240K) 


CGCCGCGTGCGGGCCAAGCA 

GCTCCTGCTC 

(143) 


CTCCTTCGGTCCTCCTATCGT 
TGTCAGAAGT 


GPR21 

(A251K) 


CCTG AT AA GCGCTAT AAAAT 

GGTCCTGTTTCGA 

(144) 


CTCCTTCGGTCCTCCTATCGT 
TGTCAGAAGT 



GPR22 
(F312K) 


GAAAGACAAAAGAGAGTCA 

AGAGGATGTCTTTATTG 

(145) 


CTCCTTCGGTCCTCCTATCGT 
TGTCAGAAGT 


GPR24 
(T304K) 


CGGAGAAAGAGGGTGAAAC 

GCACAGCCATCGCC 

(146) 


CTCCTTCGGTCCTCCTATCGT 
TGTCAGAAGT 


> GPR30 
(L258K) 


alternate approach; see below 


alternate approach; see below 


GPR31 
(Q221K) 


AAGCTTCAGCGGGCC AAGGC 

ACTGGTCACC 

(147) 


CTCCTTCGGTCCTCCTATCGT 
TGTCAGAAGT 


GFR32 
(K255A) 


CATGCCAACCGGCCCGCGAG 

GCTGCTGCTGGT 

(279) 


ACCAGCAGCAGCCTCGCGGG 

CCGGTTGGCATG 

(280) 


GPR40 
(A223K) 


CGGAAGCTGCGGGCCAAATG 

GGTGGCCGGC 

(265) 


CTCCTTCGGTCCTCCTATCGT 
TGTCAGAAGT 


GPR41 


CAGAGGAGGGTGAAGGGGCT 
GTTGGCG 


CTCCTTCGGTCCTCCTATCGT 
TGTCAGAAGT 
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h 
li 

2( 

2i 


(A223K) 


(266) 




GPR43 
(V221K) 


GGCGGCGCCGAGCCAAGGGG 

CTGGCTGTGG 

(267) 


^ l v^,^ i I K^Kj^j l L,L- 1 CL- 1 A 1 COT 

TGTCAGAAGT 


APJ 
f (L247K) 


alternate approach; see below 


alternate approach; see below 


BLR1 

(V258K) 


CAGCGGCAGAAGGCAAAAA 

GGGTGGCCATC 

(148) 


CTCCTTCGGTCCTCCTATCGT 
TGTCAGAAGT 


CEPR 
(L258K) 


CGGCAGAAGGCGAAGCGCAT 

GATCCTCGCG 

(149) 


CTCCTTCGGTCCTCCTATCGT 
TGTCAGAAGT 




[> EBI1 
(I262K) 


GAGCGCAACAAGGCCAAAA 

AGGTGATCATC 

(150) 


CTCCTTCGGTCCTCCTATCGT 
TGTCAGAAGT 


EBI2 
(L243K) 


GGTGTAAACAAAAAGGCTAA 

AAACACAATTATTCTTATT 

(151) 


CTCCTTCGGTCCTCCTATCGT 
TGTCAGAAGT 


ETBR-LP2 
I (N358K) 


GAGAGCCAGCTCAAGAGCAC 

CGTGGTG 

(152) 


CTCCTTCGGTCCTCCTATCGT 
TGTCAGAAGT 


GHSR 
(V262K) 


CCACAAGCAAACCAAGAAAA 

TGCTGGCTGT 

(153) 


CTCCTTCGGTCCTCCTATCGT 
TGTCAGAAGT 


GPCR-CNS 
(N491K) 


CTAGAGAGTCAGATGAAGTG 

TACAGTAGTGGCAC 

(155) 


CTCCTTCGGTCCTCCTATCGT 
TGTCAGAAGT 


) GPR-NGA 
(I275K) 


CGGACAAAAGTGAAAACTAA 

AAAGATGTTCCTCATT 

(156) 


CTCCTTCGGTCCTCCTATCGT 
TGTCAGAAGT 


H9a and H9b 
(F236K) 


GCTGAGGTTCGCAATAAACT 

AACCATGTTTGTG 

(157) 


CTCCTTCGGTCCTCCTATCGT 
TGTCAGAAGT 


HB954 
(H265K) 


GGGAGGCCGAGCTGAAAGCC 

ACCCTGCTC 

(158) 


CTCCTTCGGTCCTCCTATCGT 
TGTCAGAAGT 




HG38 
(V765K) 


GGGACTGCTCTATGAAAAAA 

CACATTGCCCTG 

(268) 1 


CATCAAGTGTATCATGTGCC 

AAGTACGCCC 

(154) 






HM74 
(I230K) 


CAAGATCAAGAGAGCCAAAA 

CCTTCATCATG 

(159) 


CTCCTTCGGTCCTCCTATCGT 
TGTCAGAAGT 


30 


MIG 

(T273K) 


CCGGAGACAAGTGAAGAAG 

ATGCTGTTTGTC 

(160) 


CTCCTTCGGTCCTCCTATCGT 
TGTCAGAAGT 




OGR1 
(Q227K) 


GCAAGGACCAGATCAAGCGG 

CTGGTGCTCA 

(161) 


CTCCTTCGGTCCTCCTATCGT 
TGTCAGAAGT 


3^ 


Serotonin 5HT 2A 
(C322K) 


alternate approach; see below 


alternate approach; see below 




Serotonin 5HT 2C 
(S310K) 


alternate approach; see below 


alternate approach; see below 
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V28 
(I230K) 



CAAGAAAGCCAAAGCC AAG 

AAACTGATCCTTCTG 

(162) 



CTCCTTCGGTCCTCCTATCGT 
TGTCAGAAGT 



The non-endogenous human GPCRs were then sequenced and the derived and verified nucleic 
acid and amino acid sequences are listed in the accompanying "Sequence Listing" appendix 
5 to this patent document, as summarized in Table C below: 

Table C 



Mutated GPCR 



Nucleic Acid Sequence 
Listing 



Amino Acid Sequence 
Listing 



GPR1 
(F245K) 



SEQ.ED.NO.: 163 



SEQ.DD.NO.: 164 



10 



GPR4 
(K223A) 



GPR5 
(V224K) 



SEQ.DD.NO.: 165 



SEQ.DD.NO.: 167 



SEQ.DD.NO.: 166 



SEQ.DD.NO.: 168 



It 



GPR7 
(T250K) 



SEQ.DD.NO.: 169 



SEQ.DD.NO.: 170 



GPR8 
(T259K) 



SEQ.DD.NO.: 171 



SEQ.DD.NO.: 172 



GPR9 
(M254K) 



SEQ.DD.NO.: 173 



SEQ.DD.NO.: 174 



20 



GPR9-6 
(L241K) 



GPR10 
(F276K) 



SEQ.DD.NO.: 175 



SEQ.DD.NO.: 177 



SEQ.DD.NO.: 176 



SEQ.DD.NO.: 178 



2i 



GPR15 

(I240K) 



GPR17 
(V234K) 



SEQ.DD.NO.: 179 



SEQ.DD.NO.: 181 



SEQ.DD.NO. : 180 



SEQ.DD.NO.: 182 



GPR18 
(123 IK) 



SEQ.DD.NO.: 183 



SEQ.DD.NO.: 184 



3( 



GPR20 
(M240K) 



GPR21 
(A251K) 



SEQ.DD.NO.: 185 



SEQ.DD.NO.: 187 



SEQ.DD.NO.: 186 



SEQ.DD.NO.: 188 



3f 



GPR22 
(F312K) 



SEQ.DD.NO.: 189 



SEQ.DD.NO. : 190 



GPR24 
(T304K)) 



SEQ.DD.NO.: 191 



SEQ.DD.NO.: 192 



GPR30 



SEQ.DD.NO.: 193 



SEQ.DD.NO. : 194 



BNSDOCID: <WO. _0022129A1 J. > 



WO 00/22129 



PCT/US99/23938 



-54- 



(L258K) 






GPR3I 


SEQ.ID.NO.: 195 


SEQ.ID.NO: 196 


(0221IG 






GPR32 


SEQ.ID.NO.: 269 


SEQ.ID.NO.: 270 


< /TA Iff A \ 

- (K255A) 






GPR40 
(A223K) 


SEQ.ID.NO.: 271 


SEQ.ID.NO.: 272 


GPR41 
(A223K) 


SEQ.ID.NO.: 273 


SEQ.ID.NO.: 274 


0 GPR43 
( V221K) 


SEQ.ID.NO.: 275 


SEQ.ID.NO.: 276 


APJ 

(L247K) 


SEQ.ID.NO.: 197 


SEQ.ID.NO.: 198 


BLR1 
~> (V258K) 


SEQ.ID.NO.: 199 


SEQ.ID.NO.: 200 


CEPR 
(L258K) 


SEQ.ID.NO.: 201 


SEQ.ID.NO,: 202 


EBI1 
(12o2K) 


SEQ.ID.NO.: 203 


SEQ.ID.NO.: 204 


0 EBI2 
(L243K) 


SEQ.ID.NO.: 205 


SEQ.ID.NO.: 206 


ETBR-LP2 

/XT') OT/ \ 

(N358K) 


SEQ.ID.NO.: 207 


SEQ.ID.NO.: 208 




SbQ.LD.NO.: 209 


CCA m XTY"\ . -i i A 

onv^.LLJ.fslO.: 21U 


GPCR-CNS 


SEQ.ID.NO.: 211 


SEQ.ID.NO.:212 


GPR-NGA 


SEQ.ID.NO.: 213 


SEQ.ID.NO.: 214 


0 H9a 


SEQ.ID.NO.: 215 


SEQ.ID.NO.: 216 


H9b 


SEQ.ID.NO.: 217 


SEQ.ID.NO.: 218 


HB954 


SEO ID NO • 219 


SEQ.ID.NO.:. 220 


f (H265K) 






HG38 
(V765K) 


SEQ.ID.NO.: 277 


SEQ.ID.NO.: 278 


HM74 
(I230K) 


SEQ.ID.NO.: 221 


SEQ.ID.NO.: 222 


(T273K) 


SEQ.ID.NO.: 223 


ooQ.lD.NU.: 224 


OGR1 
(Q227K) 


SEQ.ID.NO.: 225 


SEQ.ID.NO.: 226 


Serotonin 5HT 2A 


SEQ.ID.NO.: 227 


SEQ.ID.NO.: 228 


: (C322K) 






Serotonin 5HT 2C 


SEQ.ID.NO.: 229 


SEQ.ID.NO.: 230 


(S310K) 






V28 


SEQ.ID.NO.: 231 


SEQ.ID.NO.: 232 


(I230K) 
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2. Alternate Mutation Approaches for Employment of the Proline Marker 
Algorithm: APJ; Serotonin 5HT 2A ; Serotonin 5HT 2C ; and GPR30 

Although the above site-directed mutagenesis approach is particularly preferred, other 
approaches can be utilized to create such mutations; those skilled in the art are readily credited 
5 with selecting approaches to mutating a GPCR that fits within the particular needs of the artisan. 

a. APJ 

Preparation of the non-endogenous, human APJ receptor was accomplished by 
mutating L247K. Two oligonucleotides containing this mutation were synthesized: 
5'- GGCTTAAGAGCATCATCGTGGTGCTGGTG-3' (SEQ.ID.NO.: 233 ) 
10 S'-GTCACCACCAGCACCACGATGATGCTCTTAAGCC-S' (SEQ.ID.NO.: 234) 

The two oligonucleotides were annealed and used to replace the Nael-BstEII fragment of human, 
endogenous APJ to generate the non-endogenous, version of human APJ. 

b. Serotonin SHTja 

cDNA containing the point mutation C322K was constructed by utilizing the restriction 
1 5 enzyme site Sph I which encompasses amino acid 322. A primer containing the C322K mutation: 
5 *-C AAAGAAAGT ACTGGGC ATCGTCTTCTTCCT-3 ' (SEQ.ID.NO: 235) 
was used along with the primer from the 3' untranslated region of the receptor: 
5 '-TGCTCT AG ATTCC AG AT AGGTG AAAA CTTG-3' (SEQ.ID.NO.: 236) 
to perform PGR (under the conditions described above). The resulting PCR fragment was then 
20 used to replace the 3' end of endogenous 5HT 2A cDNA through the T4 polymerase blunted Sph 
I site. 

c Serotonin 5HT 2C 
The cDNA containing a S3 1 OK mutation was constructed by replacing the Sty I restriction 
fragment containing amino acid 3 1 0 with synthetic double stranded oligonucleotides that encode 
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the desired mutation. The sense strand sequence utilized had the following sequence: 

5 '-CTAGGGGC ACC ATGC AGGCTATC AAC AATGAAAG AAAAGCTAAG AAAGTC-3 * 
(SEQ. ID.NO.: 237) 

and the antisense strand sequence utilized had the following sequence: 

5 5'-CAAGGACTTTCTTAGCTTTTCTTTCATTGTTGATAGCCTGCATGGTGCCC-3 ' (SEQ. 
ID. NO.: 238) 

d GPR30 

Prior to generating non-endogenous GPR30, several independent pCR2. 1 /GPR30 isolates 
were sequenced in their entirety in order to identify clones with no PCR-generated mutations. A 

1 0 clone having no mutations was digested with EcoRl and the endogenous GPR30 cDNA fragment 
was transferred into the CMV-driven expression plasmid pCI-neo (Promega), by digesting pCI- 
Neo with EcoRI and subcloning the EcoRI-liberated GPR30 fragment from pCR2.1/GPR30, to 
generate pCI/GPR30. Thereafter, the leucine at codon 258 was mutated to a lysine using a Quick- 
Change™ Site-Directed Mutagenesis Kit (Stratagene, #200518), according to manufacturer's 

1 5 instructions, and the following primers: 

S'-CGGCGGCAGAAGGCGAAACGCATGATCCTCGCGGT-S' (SEQ.ID.NO.: 239) and 
5'-ACCGCGAGGATCATGCGTTTCGCCTTCTGC CGCCG-3' (SEQ.ID.NO.: 240) 
Example 3 

Receptor (Endogenous and Mutated) Expression 

20 

Although a variety of cells are available to the art for the expression of proteins, it is most 
preferred that mammalian cells be utilized. The primary reason for this is predicated upon 
practicalities, i.e., utilization of, e.g., yeast cells for the expression of a GPCR, while possible, 
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introduces into the protocol a non-mammalian cell which may not (indeed, in the case of 
yeast, does not) include the receptor-coupling, genetic-mechanism and secretary path ways that 
have evolved for mammalian systems - thus, results obtained in non-mammalian cells, while 
of potential use, are not as preferred as that obtained from mammalian cells. Of the 
5 mammalian cells, COS-7, 293 and 293T cells are particularly preferred, although the specific 
mammalian cell utilized can be predicated upon the particular needs of the artisan. 

Unless otherwise noted herein, the following protocol was utilized for the expression 
of the endogenous and non-endogenous human GPCRs. Table D lists the mammalian cell and 
number utilized (per 1 50mm plate) for GPCR expression. 
10 Table D 



Receptor Name 


Mammalian Cell 


(Endogenous or Non- 


(Number Utilized) 


Endogenous) 




GPR17 


293 (2 x 10") 


GPR30 


293 (4x 10 4 ) 


APJ 


COS-7 (5XI0 6 ) 


ETBR-LP2 


293 (1 x 10 7 ) 




293T(1 x 10 7 ) 


GHSR 


293 (1 x 10 7 ) 




293T(1 x 10 7 ) 


MIG 


293(1 x 10 7 ) 


Serotonin 5HT 2A 


293T(1 x 10 7 ) 


Serotonin 5HT 2c 


293T(1 x 10 7 ) 



On day one, mammalian cells were plated out. On day two, two reaction tubes were 
prepared (the proportions to follow for each tube are per plate): tube A was prepared by mixing 
20^tg DNA (e.g., pCMV vector; pCMV vector with endogenous receptor cDNA, and pCMV 
25 vector with non-endogenous receptor cDNA.) in 1.2ml serum free DMEM (Irvine Scientific, 
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Irvine, CA); tube B was prepared by mixing 120jal lipofectamine (Gibco BRL) in 1.2ml serum 
free DMEM. Tubes A and B were then admixed by inversions (several times), followed by 
incubation at room temperature for 30-45min. The admixture is referred to as the "transfection 
mixture". Plated cells were washed with 1XPBS, followed by addition of 10ml serum free 
5 DMEM. 2.4ml of the transfection mixture was then added to the cells, followed by incubation 
for 4hrs at 3 7°C/5% C0 2 . The transfection mixture was then removed by aspiration, followed by 
the addition of 25ml of DMEM/1 0% Fetal Bovine Serum. Cells were then incubated at 37°C/5% 
C0 2 . After 72hr incubation, cells were then harvested and utilized for analysis. 
1. Gi-Coupied Receptors: Co-Transfection with Gs-Coupled Receptors 
10 In the case of GPR30, it has been determined that this receptor couples the G protein Gi. 

Gi is known to inhibit the enzyme adenylyl cyclase, which is necessary for catalyzing the 
conversion of ATP to cAMP. Thus, a non-endogenous, constitutively activated form of GPR30 
would be expected to be associated with decreased levels of cAMP. Assay confirmation of a non- 
endogenous, constitutively activated form of GPR30 directly via measurement of decreasing levels 
1 5 of cAMP, while viable, can be preferably measured by cooperative use of a Gs-coupled receptor. 
For example, a receptor that is Gs-coupled will stimulate adenylyl cyclase, and thus will be 
associated with an increase in cAMP. The assignee of the present application has discovered that 
the orphan receptor GPR6 is an endogenous, constitutively activated GPCR. GPR6 couples to the 
Gs protein. Thus when co-transfected, one can readily verify that a putative GPR30-mutation 
20 leads to constitutive activation thereof: Le., an endogenous, constitutively activated 
GPR6/endogenous, non-constitutively activated GPR30 cell will evidence an elevated level of 
cAMP when compared with an endogenous, constitutively active GPR6/non-endogenous, 
constitutively activated GPR30 (the latter evidencing a comparatively lower level of cAMP). 
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Assays that detect cAMP can be utilized to determine if a candidate compound is e.g., an inverse 
agonist to a Gs-associated receptor (i.e., such a compound would decrease the levels of cAMP) or 
a Gi-associated receptor (or a Go-associated receptor) (i.e., such a candidate compound would 
increase the levels of cAMP). A variety of approaches known in the art for measuring cAMP can 
5 be utilized; a preferred approach relies upon the use of anti-cAMP antibodies. Another approach, 
and most preferred, utilizes a whole cell second messenger reporter system assay. Promoters on 
genes drive the expression of the proteins that a particular gene encodes. Cyclic AMP drives gene 
expression by promoting the binding of a cAMP-responsive DNA binding protein or transcription 
factor (CREB) which then binds to the promoter at specific sites called cAMP response elements 
1 0 and drives the expression of the gene. Reporter systems can be constructed which have a promoter 
containing multiple cAMP response elements before the reporter gene, e.g., 0-galactosidase or 
luciferase. Thus, an activated receptor such as GPR6 causes the accumulation of cAMP which then 
activates the gene and expression of the reporter protein. Most preferably, 293 cells are co- 
transfected with GPR6 (or another Gs-linked receptor) and GPR30 (or another Gi-linked receptor) 
15 plasmids, preferably in a 1:1 ratio, most preferably in a 1:4 ratio. Because GPR6 is an 
endogenous, constitutively active receptor that stimulates the production of c AMP, GPR6 strongly 
activates the reporter gene and its expression. The reporter protein such as p-galactosidase or 
luciferase can then be detected using standard biochemical assays (Chen et al. 1995). Co- 
transfection of endogenous, constitutively active GPR6 with endogenous, non-constituti vely active 
20 GPR30 evidences an increase in the luciferase reporter protein. Conversely, co-trans fection of 
endogenous, constitutively active GPR6 with non-endogenous, constitutively active GPR30 
evidences a drastic decrease in expression of luciferase. Several reporter plasmids are known and 
available in the art for measuring a second messenger assay. It is considered well within the 
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skilled artisan to determine an appropriate reporter plasmid for a particular gene expression based 
primarily upon the particular need of the artisan. Although a variety of cells are available for 
expression, mammalian cells are most preferred, and of these types, 293 cells are most preferred. 
293 cells were trans fected with the reporter plasmid pCRE-Luc/GPR6 and non-endogenous, 
5 constitutively activated GPR30 using a Mammalian Transfection™ Kit (Stratagene, #200285) 
CaP0 4 precipitation protocol according to the manufacturer's instructions (see, 28 Genomics 347 
{1995) for the published endogenous GPR6 sequence). The precipitate contained 400ng reporter, 
80ng CMV-expression plasmid (having a 1 :4 GPR6 to endogenous GPR30 or non-endogenous 
GPR30 ratio) and 20ng CMV-SEAP (a transfection control plasmid encoding secreted alkaline 
10 phosphatase). 50% of the precipitate was split into 3 wells of a 96-weIl tissue culture dish 
(containing 4X1 0 4 cells/well); the remaining 50% was discarded. The following morning, the 
media was changed. 48 hr after the start of the transfection, cells were lysed and examined for 
luciferase activity using a Luclite™ Kit (Packard, Cat. # 601691 1) and Trilux 1450 Microbeta™ 
liquid scintillation and luminescence counter (Wallac) as per the vendor's instructions. The data 
1 5 were analyzed using GraphPad Prism 2.0a (GraphPad Software Inc.). 

With respect to GPR1 7, which has also been determined to be Gi-linked, a modification 
of the foregoing approach was utilized, based upon, inter alia, use of another Gs-linked 
endogenous receptor, GPR3 (see 23 Genomics 609 (1994) and 24 Genomics 391 (1994)). Most 
preferably, 293 cells are utilized. These cells were plated-out on 96 well plates at a density of 2 
20 x 10 4 cells per well and were transfected using Lipofectamine Reagent (BRL) the following day 
according to manufacturer instructions. A DNA/lipid mixture was prepared for each 6-well 
transfection as follows: 260ng of plasmid DNA in 100^1 of DMEM were gently mixed with 2^1 
of lipid in 100^1 of DMEM (the 260ng of plasmid DNA consisted of 200ng of a 8xCRE-Luc 
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reporter plasmid (see below), 50ng of pCMV comprising endogenous receptor or non-endogenous 
receptor or pCMV alone, and lOng of a GPRS expression plasmid (GPRS in pcDNA3 
(Invitrogen)). The 8XCRE-Luc reporter plasmid was prepared as follows: vector SRIF-p-gal was 
obtained by cloning the rat somatostatin promoter (-71/+51) at BglV-Hindm site in the ppgal- 
5 Basic Vector (Clontech). Eight (8) copies of cAMP response element were obtained by PCR from 
an adenovirus template AdpCF 1 26CCRE8 (see 7 Human Gene Therapy 1883(1 996)) and cloned 
into the SRIF-p-gal vector at the Kpn-Bgl V site, resulting in the 8xCRE-(3-gal reporter vector. The 
8xCRE-Luc reporter plasmid was generated by replacing the beta-galactosidase gene in the 
8xCRE-p-gal reporter vector with the luciferase gene obtained from the pGL3-basic vector 
10 (Promega) at the Hindm-BamHI site. Following 30min. incubation at room temperature, the 
DNA/lipid mixture was diluted with 400 \x\ of DMEM and 1 00^1 of the diluted mixture was added 
to each well. 100 jil of DMEM with 10% FCS were added to each well after a 4hr incubation in 
a cell culture incubator. The next morning the transfected cells were changed with 200 p.I/well of 
DMEM with 10% FCS. Eight (8) hours later, the wells were changed to 100 jj.1 /well of DMEM 
15 without phenol red, after one wash with PBS. Luciferase activity were measured the next day 
using the LucLite™ reporter gene assay kit (Packard) following manufacturer instructions and read 
on a 1450 MicroBeta™ scintillation and luminescence counter (Wallac). 

Figure 4 evidences that constitutively active GPR30 inhibits GPR6-mediated 
activation of CRE-Luc reporter in 293 cells. Luciferase was measured at about 4.1 relative 
20 light units in the expression vector pCMV. Endogenous GPR30 expressed luciferase at about 
8.5 relative light units, whereas the non-endogenous, constitutively active GPR30 (L258K), 
expressed luciferase at about 3.8 and 3.1 relative light units, respectively. Co-transfection of 
endogenous GPR6 with endogenous GPR30, at a 1 :4 ratio, drastically increased luciferase 
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expression to about 1 04. 1 relative light units. Co-transfection of endogenous GPR6 with non- 
endogenous GPR30 (L258K), at the same ratio, drastically decreased the expression, which 
is evident at about 18.2 and 29.5 relative light units, respectively. Similar results were 
observed with respect to GPR17 with respect to co-transfection with GPR3, as set forth in 
5 FigureS. 
Example 3 

Assays For determination of Constitutive Activity 
of Non-Endogenous GPCRs 

A. Membrane Binding Assays 

10 l. [ 35 S]GTPYS Assay 

When a G protein-coupled receptor is in its active state, either as a result of ligand binding 

or constitutive activation, the receptor couples to a G protein and stimulates the release of GDP 

and subsequent binding of GTP to the G protein. The alpha subunit of the G protein-receptor 

complex acts as a GTPase and slowly hydrolyzes the GTP to GDP, at which point the receptor 

1 5 normally is deactivated. Constitutively activated receptors continue to exchange GDP for GTP. 

The non-hydrolyzable GTP analog, [ 35 S]GTPyS, can be utilized to demonstrate enhanced binding 

of [ 35 S]GTPyS to membranes expressing constitutively activated receptors. The advantage of 

using [ 35 S]GTPyS binding to measure constitutive activation is that; (a) it is generically applicable 

to all G protein-coupled receptors; (b) it is proximal at the membrane surface making it less likely 

20 to pick-up molecules which affect the intracellular cascade. 

The assay utilizes the ability of G protein coupled receptors to stimulate [ 35 S]GTPyS 

binding to membranes expressing the relevant receptors. The assay can, therefore, be used in 

the direct identification method to screen candidate compounds to known, orphan and 

constitutively activated G protein-coupled receptors. The assay is generic and has application 
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to drug discovery at all G protein-coupled receptors. 

The [ 35 S]GTPyS assay was incubated in 20 mM HEPES and between 1 and about 20mM MgCl 2 
(this amount can be adjusted for optimization of results, although 20mM is preferred) pH 7.4, 
binding buffer with between about 0.3 and about 1 .2 nM [ 35 S]GTPyS (this amount can be adjusted 
5 for optimization of results, although 1.2 is preferred ) and 12.5 to 75 fig membrane protein (e.g. 
COS-7 cells expressing the receptor; this amount can be adjusted for optimization, although 75|ig 
is preferred) and 1 fiM GDP (this amount can be changed for optimization) for 1 hour. 
Wheatgerm agglutinin beads (25 jil; Amersham) were then added and the mixture was incubated 
for another 30 minutes at room temperature. The tubes were then centrifuged at 1500 x g for 5 

10 minutes at room temperature and then counted in a scintillation counter. 

A less costly but equally applicable alternative has been identified which also meets the 
needs of large scale screening. Flash plates™ and Wallac™ scintistrips may be utilized to format 
a high throughput [ 35 S]GTPyS binding assay. Furthermore, using this technique, the assay can be 
utilized for known GPCRs to simultaneously monitor tritiated ligand binding to the receptor at the 

1 5 same time as monitoring the efficacy via [ 35 S]GTPyS binding. This is possible because the Wallac 
beta counter can switch energy windows to look at both tritium and 35 S-labeled probes. This assay 
may also be used to detect other types of membrane activation events resulting in receptor 
activation. For example, the assay may be used to monitor 32 P phosphorylation of a variety of 
receptors (both G protein coupled and tyrosine kinase receptors). When the membranes are 

20 centrifuged to the bottom of the well, the bound [ 35 S]GTPyS or the 32 P-phosphorylated receptor 
will activate the scintillant which is coated of the wells. Scinti^ strips (Wallac) have been used to 
demonstrate this principle. In addition, the assay also has utility for measuring ligand binding to 
receptors using radioactively labeled ligands. In a similar manner, when the radiolabeled bound 
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ligand is centrifuged to the bottom of the well, the scintistrip label comes into proximity with the 
radiolabeled ligand resulting in activation and detection. 

Representative results of graph comparing Control (pCMV), Endogenous APJ and Non- 
Endogenous APJ, based upon the foregoing protocol, are set forth in Figure 6. 
5 2. Adenylyl Cyclase 

A Flash Plate™ Adenylyl Cyclase kit (New England Nuclear; Cat. No. SMP004A) 
designed for cell-based assays was modified for use with crude plasma membranes. The Flash 
Plate wells contain a scintillant coating which also contains a specific antibody recognizing cAMP. 
The cAMP generated in the wells was quantitated by a direct competition for binding of 
1 0 radioactive cAMP tracer to the cAMP antibody. The following serves as a brief protocol for the 
measurement of changes in cAMP levels in membranes that express the receptors. 

Transfected cells were harvested approximately three days after transfection. Membranes 
were prepared by homogenization of suspended cells in buffer containing 20mM HEPES, pH 7.4 
and lOmM MgCl 2 . Homogenization was performed on ice using a Brinkman Polytron™ for 
15 approximately 10 seconds. The resulting homogenate was centrifuged at 49,000 X g for 15 
minutes at 4°C. The resulting pellet was then resuspended in buffer containing 20mM HEPES, 
pH 7.4 and 0. 1 mM EDTA, homogenized for 10 seconds, followed by centrifiigation at 49,000 X 
g for 15 minutes at 4°C. The resulting pellet can be stored at -80°C until utilized. On the day of 
measurement, the membrane pellet was slowly thawed at room temperature, resuspended in buffer 
20 containing 20mM HEPES, pH 7.4 and 1 OmM MgCL^ (these amounts can be optimized, although 
the values listed herein are prefereed), to yield a final protein concentration of 0.60mg/ml (the 
resuspended membranes were placed on ice until use). 

cAMP standards and Detection Buffer (comprising 2 nCi of tracer [ I25 I cAMP (100 ^1] to 
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1 1 ml Detection Buffer) were prepared and maintained in accordance with the manufacturer's 
instructions. Assay Buffer was prepared fresh for screening and contained 20mM HEPES, pH 7.4, 
lOmM MgCl 2 , 20mM (Sigma), 0.1 units/ml creatine phosphokinase (Sigma), 50 jiM GTP 
(Sigma), and 0.2 mM ATP (Sigma); Assay Buffer can be stored on ice until utilized. The assay 
5 was initiated by addition of 50ul of assay buffer followed by addition of 50ul of membrane 
suspension to the NEN Flash Plate. The resultant assay mixture is incubated for 60 minutes at 
room temperature followed by addition of 1 OOul of detection buffer. Plates are then incubated an 
additional 2-4 hours followed by counting in a Wallac MicroBeta scintillation counter. Values of 
cAMP/well are extrapolated from a standard cAMP curve which is contained within each assay 
1 0 plate. The foregoing assay was utilized with respect to analysis of MIG 
B. Reporter-Based Assays 

1. CREB Reporter Assay (Gs-associated receptors) 
A method to detect Gs stimulation depends on the known property of the transcription 
factor CREB, which is activated in a cAMP -dependent manner. A PathDetect CREB trans- 
15 Reporting System (Stratagene, Catalogue # 219010) was utilized to assay for Gs coupled 
activity in 293 or 293T cells. Cells were transfected with the plasmids components of this 
above system and the indicated expression plasmid encoding endogenous or mutant receptor 
using a Mammalian Transfection Kit (Stratagene, Catalogue #200285) according to the 
manufacurer's instructions. Briefly, 400 ng pFR-Luc (luciferase reporter plasmid containing 
20 Gal4 recognition sequences), 40 ng pFA2-CREB (Gal4-CREB fusion protein containing the 
Gal4 DNA-binding domain), 80 ng CMV-receptor expression plasmid (comprising the 
receptor) and 20 ng CMV-SEAP (secreted alkaline phosphatase expression plasmid; alkaline 
phosphatase activity is measured in the media of transfected cells to control for variations in 
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transfection efficiency between samples) were combined in a calcium phosphate precipitate 
as per the Kit's instructions. Half of the precipitate was equally distributed over 3 wells in a 
96-well plate, kept on the cells overnight, and replaced with fresh medium the following 
morning. Forty-eight (48) hr after the start of the transfection, cells were treated and assayed 
5 for luciferase activity as set forth with resepct to the GPR30 system, above. This assay was 
used with respect to GHSR. 

2. API reporter assay (Gq-associated receptors) 
Ae method to detect Gq stimulation depends on the known property of Gq-dependent 
phospholipase C to cause the activation of genes containing AP 1 elements in their promoter. 
10 A Pathdetect AP-1 cis-Reporting System (Stratagene, Catalogue # 219073) was utilized 
following the protocl set forth above with respect to the CREB reporter assay, except that the 
components of the calcium phosphate precipitate were 410 ng pAPl-Luc, 80 ng receptor 
expression plasmid, and 20 ng CMV-SEAP. This assay was used with respect to ETBR-LP2 

C. Intracellular IP3 Accumulation Assay 

15 O n day 1, cells comprising the serotonin receptors (endogenous and mutated) were 

plated onto 24 well plates, usually lxl 0 5 cells/well. On day 2 cells were transfected by firstly 
mixing 0.25ug DNA in 50 ul serumfree D MEM/well and 2 ul lipofectamine in 50 |il 
serumfree DMEM/well. The solutions were gently mixed and incubated for 15-30 min at 
room temperature. Cells were washed with 0.5 ml PBS and 400 \il of serum free media was 
20 mixed with the transfection media and added to the cells. The cells were then incubated for 
3-4 hrs at 37°C/5%C0 2 and then the transfection media was removed and replaced with 
1 ml/well of regular growth media. On day 3 the cells were labeled with 3 H-myo-inositol. 
Briefly, the media was removed the cells were washed with 0.5 ml PBS. Then 0.5 ml inositol- 
free/serumfree media ( GffiCO BRL) was added/well with 0.25 ^iCi of 3 H-myo-inositol / well 
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and the cells were incubated for 16-18 hrs o/n at 37°C/5%C0 2 . On Day 4 the cells were 
washed with 0.5 ml PBS and 0.45 ml of assay medium was added containing inositol- 
free/serum free media 10 jiM pargyline 10 mM lithium chloride or 0.4 ml of assay medium 
and 50 ul of lOx ketanserin (ket) to final concentration of 10fiM. The cells were then 
5 incubated for 30 min at 37°C. The cells were then washed with 0.5 ml PBSand 200 ul of 
fresh/icecold stop solution (1M KOH; 18 mM Na-borate; 3.8 mM EDTA) was added/well. 
The solution was kept on ice for 5-10 min or until cells were lysed and then neutralized by 
200 ^1 of fresh/ice cold neutralization sol. (7.5 % HCL). The lysate was then transferred into 
1.5 ml eppendorf tubes and 1 ml of chloroform/methanol (1 :2) was added/tube. The solution 
10 was vortexed for 15 sec and the upper phase was applied to a Biorad AG1-X8 anion 
exchange resin ( 1 00-200 mesh). Firstly, the resin was washed with water at 1 : 1 .25 W/V and 
0.9 ml of upper phase was loaded onto the column. The column was washed with 10 mis of 
5 mM myo-inositol and 10 ml of 5 mM Na-borate/60mM Na- formate. The inositol tris 
phosphates were eluted into scintillation vials containing 10 ml of scintillation cocktail with 
15 2 ml of 0.1 M formic acid/ 1 M ammonium formate. The columns were regenerated by 
washing with 10 ml of 0.1 M formic acid/3M ammonium formate and rinsed twice with dd 
H 2 0 and stored at 4°C in water. 

Figure 7 provides an illustration of IP3 production from the human 5-HT 2A receptor 
that incorporates the C322K mutation. While these results evidence that the Proline Mutation 
20 Algorithm approach constitutively activates this receptor, for purposes of using such a 
receptor for screening for identification of potential therapeutics, a more robust difference 
would be preferred. However, because the activated receptor can be utilized for understanding 
and elucidating the role of constitutive activation and for the identification of compounds that 
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can be further examined, we believe that this difference is itself useful in differentiating 
between the endogenous and non-endogenous versions of the human 5HT 2A receptor. 
D. Result Summary 

The results for the GPCRs tested are set forth in Table E where the Per-Cent Increase 
5 indicates the percentage difference in results observed for the non-endogenous GPCR as compared 
to the endogenous GPCR; these values are followed by parenthetical indications as to the type of 
assay utilized. Additionally, the assay sytem utilized is parenthetically listed (and, in cases where 
different Host Cells were used, both are listed). As these results indicate, a variety of assays can 
be utilized to determine constitutive activity of the non-endogenous versions of the human GPCRs. 
1 0 Those skilled in the art, based upon the foregoing and with reference to information available to 
the art, are creditied with theability to selelect and/ot maximize a particular assay approach that 
suites the particualr needs of theinvestigator. 



Table E 



Receptor Identifier 


Per-Cent Difference 


(Codon Mutation) 




GPR17 


74.5 


(V234K) 


(CRE-Luc) 


GPR30 


71.6 


(L258K) 


(CREB) 


APJ 


49.0 


(L247K) 


(GTPyS) 


ETBR-LP2 


48.4(AP1-Luc - 293) 


(N358K) 


61.1(AP1-Luc-293T) 





GHSR 


58.9(CREB - 293) 


25 


(V262K) 


35.6(CREB - 293T) 
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MIG 


39 (cAMP) 


(I230K) 




Serotonin 5HT 2A 


33.2 (IP 3 ) 


(C322K) 




Serotonin 5HT 2C 


39.1(IP 3 ) 


(S310K) 





Example 6 

Tissue Distribution of Endogenous Orphan GPCRs 

Using a commercially available human-tissue dot-blot format, endogenous orphan GPCRs 
1 0 were probed for a determination of the areas where such receptors are localized. Except as indicate 
below, the entire receptor cDNA (radiolabeled) was used as the probe: radiolabeled probe was 
generated using the complete receptor cDNA (excised from the vector) using a Prime-It II™ 
Random Primer Labeling Kit (Stratagene, #300385), according to manufacturer's instructions. 
A human RNA Master Blot™ (Clontech, #7770-1) was hybridized with the GPCR 
15 radiolabeled probe and washed under stringent conditions according manufacturer's 
instructions. The blot was exposed to Kodak BioMax Autoradiography film overnight at - 
80°C. 

Representative dot-blot format results are presented in Figure 8 for GPR1 (8 A), GPR30 
(8B), and APJ (8C), with results being summarized for all receptors in Table F 
20 Table F 



GPCR 


Tissue Distribution 
(highest levels, relative to other tissues in 
the dot-blot) 


GPR1 


Placenta, Ovary, Adrenal 
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5 



15 



GPR4 


Broad; highest in Heart, Lung, Adrenal, 
Thyroid, Spinal Cord 


GPRS 


Placenta, Thymus, Fetal Thymus 
Lesser levels in spleen, fetal spleen 


GPR7 


Liver, Spleen, Spinal Cord, Placenta 


GPR8 


No expression detected 


GPR9-6 


Thymus, Fetal Thymus 
Lesser levels in Small Intestine 


GPR18 


Spleen, Lymph Node, Fetal Spleen, Testis 


GPR20 


Broad 


GPR21 


Broad; very low abundance 


GPR22 


Heart, Fetal Heart 
Lesser levels in Brain 


GPR30 


Stomach 


GPR31 


Broad 


BLR1 


Spleen 


CEPR 


Stomach, Liver, Thyroid, Putamen 


EBI1 


Pancreas 

Lesser levels in Lymphoid Tissues 


EBI2 


Lymphoid Tissues, Aorta, Lung, Spinal Cord 


ETBR-LP2 


Broad; Brain Tissue 


GPCR-CNS 


Brain 

Lesser levels in Testis, Placenta 


GPR-NGA 


Pituitary 

Lesser levels in Brain 


Tin 

tiy 


Pituitary 


HB954 


Aorta, Cerebellum 

Lesser levels in most other tissues 


HM74 


Spleen, Leukocytes, Bone marrow, Mammary 
Glands, Lung, Trachea 


MIG 


Low levels in Kidney, Liver, Pancreas, Lung, 
Spleen 


ORG1 


Pituitary, Stomach, Placenta 


V28 


Brain, Spleen, Peripheral Leukocytes 



25 Based upon the foregoing information, it is noted that human GPCRs can also be assessed 

for distribution in diseased tissue; comparative assessments between "normal" and diseased tissue 
can then be utilized to determine the potential for over-expression or under-expression of a 
particular receptor in a diseased state. In those circumstances where it is desirable to utilize the 
non-endogenous versions of the human GPCRs for the purpose of screening to directly identify 
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candidate compounds of potential therapeutic relevance, it is noted that inverse agonists are useful 
in the treatment of diseases and disorders where a particular human GPCR is over-expressed, 
whereas agonists or partial agonists are useful in the treatment of diseases and disorders where a 
particular human GPCR is under-expressed. 
5 As desired, more detailed, cellular localization of the recepotrs, using techniques well- 

known to those in the art (e.g., in-situ hybridization) can be utilized to identify particualr cells 
within these tissues where the receptor of interest is expressed. 

It is intended that each of the patents, applications, and printed publications mentioned in 
this patent document be hereby incorporated by reference in their entirety. 

10 As those skilled in the art will appreciate, numerous changes and modifications may be 

made to the preferred embodiments of the invention without departing from the spirit of the 
invention. It is intended that all such variations fall within the scope of the invention. 

Although a variety of expression vectors are available to those in the art, for purposes of 
utilization for both the endogenous and non-endogenous human GPCRs, it is most preferred that 

15 the vector utilized be pCMV. This vector has been deposited with the American Type Culture 
Collection (ATCC) on October 13, 1998 (10801 University Blvd., Manassas, VA 201 10-2209 
USA) under the provsions of the Budapest Treaty for the International Recognition of the Deposit 
of Microorganisms for the Purpose of patent Procedure. The vector was tested by the ATCC on 
, 1998 and determined to be viable on , 1998. The ATCC has assigned 

20 the following deposit number to pCMV: . 
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CLAIMS 

What is claimed is: 

1 . A constitutively active, non-endogenous version of an endogenous human orphan G protein- 
coupled receptor (GPCR) comprising the following amino acid residues (carboxy-terminus to amino- 
terminus orientation) transversing the transmembrane-6 (TM6) and intracellular loop-3 (IC3) regions 
of the non-endogenous GPCR: 

P' AA I5 X 

wherein: 

(1) P 1 is an amino acid residue located within the TM6 region of the non- 
endogenous GPCR, where P 1 is selected from the group consisting 

of (i) the endogenous orphan GPCR proline residue, and (ii) a non- 
endogenous amino acid residue other than proline; 

(2) AA 15 are 1 5 amino acid residues selected from the group consisting 
of (a) the 15 endogenous amino acid residues of the endogenous 

15 orphan GPCR, (b) 1 5 non-endogenous amino acid residues, and (c) 

a combination of 15 amino acid residues, the combination 
comprising at least one endogenous amino acid residue of the 
endogenous orphan GPCR and at least one non-endogenous amino 
acid residue, excepting that none of the 15 endogenous amino acid 

20 residues that are positioned within the TM6 region of the GPCR is 

proline; and 

(2) X is a non-endogenous amino acid residue located within the 1C3 region 
of said non-endogenous GPCR. 

2. The non-endogenous human GPCR of claim 1 wherein P 1 is the endogenous proline 
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residue. 

3 . The non-endogenous human GPCR of claim 1 wherein P 1 is a non-endogenous amino 
acid residue other than a proline residue. 

4. The non-endogenous human GPCR of claim 1 wherein AA 15 are the 15 endogenous 
amino acid residues of the endogenous GPCR. 

5. The non-endogenous human GPCR of claim 1 wherein X is selected from the group 
consisting of lysine, hisitidine, arganine and alanine residues, excepting that when the 
endogenous amino acid in position X of said endogenous human GPCR is lysine, X 
is selected from the group consisting of histidine, arginine and alanine. 

6. The non-endogenous human GPCR of claim 1 wherein X is a lysine residue, excepting 
that when the endogenous amino acid in position X of said endogenous human GPCR 
is lysine, X is an amino acid other than lysine. 

7. The non-endogenous human GPCR of claim 4 wherein X is a lysine residue, excepting 
that when the endogenous amino acid in position X of said endogenous human GPCR 
is lysine, X is an amino acid other than lysine. 

8. The non-endogenous, human GPCR of claim 1 wherein P 1 is a proline residue and X 
is a lysine residue, excepting that when the endogenous amino acid in position X of 
said endogenous human GPCR is lysine, X is an amino acid other than lysine. 

9. A host cell comprising the non-endogenous human GPCR of claim 1 . 

J 0. The material of claim 9 wherein said host cell is of mammalian origin. 

1 1 . The non-endogenous human GPCR of claim 1 in a purified and isolated form. 

12. A nucleic acid sequence encoding a constitutively active, non-endogenous version of 
an endogenous human orphan G protein-coupled receptor (GPCR) comprising the following 
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nucleic acid sequence region transversing the transmembrane-6 (TM6) and intracellular loop-3 
(IC3) regions of the orphan GPCR: 

3'-P c ^(AA-codon) 15 X codon -5' 

wherein: 

5 ( 1 ) P codon is a nucleic acid encoding region within the TM6 region of the 

non-endogenous GPCR, where P^ 0 " encodes an amino acid selected 
from the group consisting of (i) the endogenous GPCR proline residue, 
and (ii) a non-endogenous amino acid residue other than proline; 

(2) (AA-codon) 15 are 1 5 codons encoding 1 5 amino acid residues selected 
10 from the group consisting of (a) the 15 endogenous amino acid 

residues of the endogenous orphan GPCR, (b) 15 non-endogenous 
amino acid residues, and (c) a combination of 15 amino acid residues, 
the combination comprising at least one endogenous amino acid 
residue of the endogenous orphan GPCR and at least one non- 
15 endogenous amino acid residue, excepting that none of the 15 

endogenous amino acid residues that are positioned within the TM6 
region of the orphan GPCR is proline; and 

(3) X,.^ is a nucleic acid encoding region residue located within the IC3 
region of said non-endogenous human GPCR, where encodes a 

20 non-endogenous amino acid. 

13. The nucleic acid sequence of claim 12 wherein P 00 ** 00 encodes an endogenous proline 
residue. 

14. The nucleic acid sequence of claim 12 wherein P codon encodes a non-endogenous 
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amino acid residue other than a proline residue. 

15. The nucleic acid sequence of claim 12 wherein X codon encodes a non-endogenous 
amino acid selected from the group consisting of lysine, histidine, arginine and 
alanine, excepting that when the endogenous amino acid in position X of said 

5 endogenous human GPCR is lysine, X codon encodes an amino acid selected from the 

group consisiting of histidine, arginine and alanine. 

16. The nucleic acid sequence of claim 13 wherein X codon encodes a non-endogenous 
lysine amino acid excepting that when the endogenous amino acid in position X of 
said endogenous human GPCR is lysine, X codon encodes an amino acid selected from 

10 the group consisiting of histidine, arginine and alanine. 

17. The nucleic acid sequence of claim 12 wherein X^,, is selected from the group 
consisting of AAA, AAG, GCA, GCG, GCC and GCU. 

18. The nucleic acid sequence of claim 12 wherein X codon is selected from the group 
consisting of AAA and AAG. 

15 19. The nucleic acid sequence of claim 12 wherein p codon is selected from the group 
consisting of CCA, CCC, CCG and CCU, and X codon is selected from the group 
consisting of AAA and AAG. 

20. A vector comprising the nucleic acid sequence of claim 12. 

21 . A plasmid comprising the nucleic acid sequence of claim 12. 
20 22. A host cell comprising the nucleic acid sequence of claim 21 . 

23. The nucleic acid sequence of claim 12 in a purified and isolated form. 

24. A method for selecting for alteration an endogenous amino acid residue within the 
third intracellular loop of a human G protein-coupled receptor ("GPCR"), said receptor 
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comprising a transmembrane 6 region and an intracellular loop 3 region, which endogenous 
amino acid, when altered to a non-endogenous amino acid, constitutively activates said human 
GPCR, comprising the following steps: 

(a) identifying an endogenous proline residue within the transmembrane 6 region 
of a human GPCR; 

(b) identifying, by moving in a direction of the carboxy-terminus region of said 
GPCR towards the amino- terminus region of said GPCR, the endogenous, 1 6 th 
amino acid residue from said proline residue; 

(c) altering the endogenous residue of step (b) to a non-endogenous amino acid 
residue to create a non-endogenous version of an endogenous human GPCR; 
and 

(d) determining whether the non-endogenous human GPCR of step (c) is 
constitutively active. 

25. The method of claim 24 wherein the amino acid residue that is two residues from said 
proline residue in the transmembrane 6 region, in a carboxy-terminus to amino- 
terminus direction, is tryptophan. 

26. A constitutively active, non-endogenous human GPCR produced by the process of 
claim 24. 

27. A constitutively active, non-endogenous human GPCR produced by the process of 
claim 25. 

28 . An algorithmic approach for creating a non-endogenous, constitutively active version 
of an endogenous human G protein coupled receptor (GPCR), said endogenous GPCR 
comprising a transmembrane 6 region and an intracellular loop 3 region, the 
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algorithmic approach comprising the steps of: 

(a) selecting an endogenous human GPCR comprising a proline residue in the 
transmembrane-6 region; 

(b) identifying, by counting 16 amino acid residues from the proline residue of 
step (a), in a carboxy-terminus to amino-terminus direction, an endogenous 
amino acid residue; 

(c) altering the identified amino acid residue of step (b) to a non-endogenous 
amino acid residue to create a non-endogenous version of the endogenous 
human GPCR; and 

(d) determining if the non-endogenous version of the endogenous human GPCR 
of step (c) is constitutively active. 

29. The algorithmic approach of claim 28 wherein the amino acid residue that is two 
residues from said proline residue in the transmembrane 6 region, in a carboxy- 
terminus to amino-terminus direction, is tryptophan. 

30. A constitutively active, non-endogenous human GPCR produced by the algorithmic 
approach of claim 28. 

31. A constitutively active, non-endogenous human GPCR produced by the algorithmic 
approach of claim 29. 

32. A method for directly identifying a compound selected from the group consisting of 
inverse agonists, agonists and partial agonists to a non-endogenous, constitutively 
activated human G protein coupled receptor, said receptor comprising a 
transmembrane-6 region and an intracellular loop-3 region, comprising the steps of: 
(a) selecting an endogenous human GPCR; 
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(b) identifying a proline residue within the transmembrane-6 region of the GPCR 
of step (a); 

(c) identifying, in a carboxy-terminus to amino-terminus direction, the 
endogenous, 16 th amino acid residue from the proline residue of step (b); 

(d) altering the endogenous amino acid of step (c) to a non-endogenous amino 
acid; 

(e) confirming that the non-endogenous GPCR of step (d) is constitutively active; 

(f) contacting a candidate compound with the non-endogenous, constitutively- 
activated GPCR of step (e); and 

(g) determining, by measurement of the compound efficacy at said contacted 
receptor, whether said compound is an inverse agonist, agonist or partial 
agonist of said receptor. 



15 35. 



33. The method of claim 32 wherein the non-endogenous amino acid of step (d) is lysine. 

34. A compound directly identified by the method of claim 32. 
The method of claim 32 wherein the directly identified compound is an inverse 
agonist. 

36. The method of claim 32 wherein the directly identified compound is an agonist.-- 

37. The method of claim 32 wherein the directly identified compound is a partial agonist. 

38. A composition comprising the inverse agonist of claim 35. 
20 39. A composition comprising the agonist of claim 36. 

40. A composition comprising the partial agonist of claim 37. 

41. A method for directly identifying an inverse agonist to a non-endogenous, 
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constitutively activated human G protein coupled receptor ("GPCR"), said GPCR comprising 
a transmembrane-6 region and an intracellular loop-3 region, comprising the steps of: 

(a) selecting an endogenous human GPCR; 

(b) identifying a proline residue within the transmembrane-6 region of the GPCR of 
step (a); 

(c) identifying, in a carboxy-terminus to amino-terminus direction, the 
endogenous, 1 6 lh amino acid residue from the proline residue of step (b); 

(d) altering the endogenous amino acid of step (c) to a non-endogenous lysine residue; 

(e) confirming that the non-endogenous GPCR of step (d) is constitutively active; 

(f) contacting a candidate compound with the non-endogenous, constitutively- 
activated GPCR of step (e); and 

(g) determining, by measurement of the compound efficacy at said contacted receptor, 



whether said compound is an inverse agonist of said receptor. 



42. 



An inverse agonist directly identified by the method of claim 37. 



43. 



A composition comprising an inverse agonist of claim 38. 



BNSOOCID: <WO 00221 29A 11. 



WO 00/22129 



1/19 



PCT/US99/23938 




WO 00/22129 



2/19 



PCT/US99/23938 




129A1 I > 



WO 00/22129 



3/19 



PCT/US99/23938 



pCMV Sequence and Restriction Site 



Pst I 
Ava I 
Nci I 
Nci I 



£coR V 



Hind lit 



£coR I 



.Sma I 

BamH I Spe I Xba I 



BsrB ! 
Not I 



JHae til 



.Sac If 
BstX I Sac I 



AAGCTTGATATCGAATTCCTGCAGCCCGGGGGATCCACTAGTTCTAGAGCGGCCGCCACCGCGGTGGAGCTCCAGCTTTT 



TTCGAACTATAGCTTAAGGACGTCGGGCCCCCTAGGTGATCAAGATCTCGCCGGCGGTGGCGCCACCTCGAGGTCGAAAA 



■+ 80 



KLDIEFLOPGGSTSSRAAATAVELOLL 
SL ISNSCSPGDPLVLERPPPRWSSSF 
OA. YRIPAARGIH. F SGRHRGGAPAF 



-f- 



■+- 



■+ 



LSS 1 SNR CGPPD VLELAAAVATSSWSK 
LKIDFEOLGPSGSTRSRGGGRHLELKQ 
AQYRIGAARPIW. N LPRWRPPAGAK 



.BssH II 

1 

GTTCCCTTTAGTGAGGGTTAATTGCGCGCTAGAGGATCTTTGTGAAGGAACCTTACTTCTGTGGTGTGACATAATTGGAC 



CAAGGGAAATCACTCCCAATTAACGCGCGATCTCCTAGAAACACTTCCTTGGAATGAAGACACCACACTGTATTAACCTG 



■+ 160 



FPLVRVNCALEDLCEGTLLLWCD I IG 
C S L . . GLIAR. RIFVKEPYFCGVT. LD 
VPFSEG. LRARGSL. RNLTSVV. HNWT 



■+ 



NGKTLTLQASSSRQSPVKSRHHSM I PC 
ER. HPNIAR. LIKTFSG. KQPTVYNS 
TGKLSP. NRALPDKHLFRVETTHCLQV 



Dra ! 

AAACTACCTACAGAGATTTAAAGCTCTAAGGTAAATATAAAATTTTTAAGTGTATAATGTGTTAAACTACTGATTCTAAT 



•+- 



-+■ 



■+- 



■+ 240 



TTTGATGGATGTCTCTAAATTTCGAGATTCCATTTATATTTTAAAAATTCACATATTACACAATTTGATGACTAAGATTA 



GTTYRD LKL. GKYKIFK^CIMC. TTDSN 
KLPTEI. SSKVNIKFLSV. CVKLLILI 
NYLGRFKALR. I. N F * VYNVLNY. F . 



■+- 



■4- 



-+■ 



VV.LSKFS. PLYLIKLHIIH. VVSEL 
LSGVSI. LELTF IFNKLTYHTLSS IR I 
F . RCLNLARLY IYFK. TYLTNF. Q N N 



TGTTTGTGTATTTTAGATTCCAACCTATGGAACTGATGAATGGGAGCAGTGGTGGAATGCCTTTAATGAGGAAAACCTGT 



■+ 320 



ACAAACACATAAAATCTAAGGTTGGATACCTTGACTACTTACCCTCGTCACCACCTTACGGAAATTACTCCTTTTGGACA 



CLCILDSNLWN. . MGAVVECL. . GKPV 
VCVF. I PTYGTDEWEQWWNAFNEENL 
L FVYFRFQPMELHNGSSGGMPLflRKTC 



■+- 



■+- 



-+- 



OKH IKSELRHFQH I PATTSHR. HPFGT 
TQTN. I G V . PVSSHSCHHFAKLSSFRN 
NTYKLNWGISS IFPLLPPIGK I LFVQ 



FIGURE 3 A- 
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TTTGCTCAGAAGAAATGCCATCTAGTGATGATGAGGCTACTGCTGACTCTCAACATTCTA CTCCTCCAAAAAAGAAGAGA 

i I i 1 1 1 1 — 1 ' I 1 « 1 I ' l ' I 400 

AAACGAGTCTTCTTTACGGTAGATCACTACTACTCCGATGACGACTGAGAGTTGTAAGATGAGGAGGTTTTTTCTTCTCT 

l l ° r n a i . . . g y c . lstfysskkee 
fcs'eempssddeatadsohstppkkkr 

FAOKKCHLVHflRLLLTLNILLLQKRRE 

i ] 1 ■ i 1 1 ' * 1 1 1 1 i 1 ! 1 h 

KSLLFAfl. HHHP. QOSEVN. EELFSSF 
OESSIGDLSSSAVASE. CEVGGFFFL 
K A . FFHWRTI ILSSSVRLflRSRWFLLS 

■Sty I 

AAGGTAGAAGACCCCAAGGACTTTCCTTCAGAATTGCTAAGTTTTTTGAGTCATCCTGTGTTTAGTA ATAGAACTCTTGC 

t | i ] ■ l 1 1 i ■ i i t t t l i i 480 

TTCCATCTTCTGGGGTTCCTGAAAGGAAGTCTTAACGATTCAAAAAACTCAGTACGACACAAATCATTATCTTGAGAACG 

KGRRPQGLSFR I AKFFESCCV. . . NSC 
KVEDPKDFPSELLSFLSHAVFSNRTLA 
r. KTPRTFLQNC. VF. VMLCLV IELL 

, i 1 1 1 1 1 1 ' i < 1 ' 1 1 1- 

PLLGWPSEKL I ALNKSDHQT. YYFEO 
FTSSGLSKGESNSLKKL. ATNLLLVRA 
LYFVGLVKR. F 0 . TKQTMSHKT I SSK S 

TTGCTTTGCTATTTACACCACAAAGGAAAAAGCTGCACTGCTATACAAGAAAATTATGGAAAAATATTCTGTAACCTTTA 

, | i i ■ t 1 1 i i ■ i i i i.i. i . i i 560 
AACGAAACGATAAATGTGGTGTTTCCTTTTTCGACGTGACGATATGTTCTTTTAATACCTTTTTATAAGACATTGGAAAT 

LLCYLHHKGKSCTAI QENYGK IFCNLY 
CFAI YTTKEKAALLYKK 1 MEKYSVTF 
LALLFTPQRKKLHCYTRKLWKN I L* PL 

, f 1 j 1 ! 1 i 1 ! • ! ' 1 ' H 

X S Q . KCWLPFLQVAICSF. PFINGLR. 
Q K A I . VVFSFAASSYLFI ISFYETVKI 
AKSNVGCLFFSCQ. VLFNHFF'IRYGK 

Ase! 

TAAGTAGGCATAACAGTTATAATCATAACATACTGTTTTTTCTTACTCCACACAGGCATAGAGTGTCTGCTATTAATAAC 

, i ! _| 1 ■ 1 1 1 1 1 ' I 1 1 ' 1- 640 

ATTCATCCGTATTGTCAATATTAGTATTGTATGACAAAAAAGAATGAGGTGTGTCCGTATCTCACAGACGATAATTATTG 

JC . A . QL. S. HTVFSYSTQA. SVCY. . 
ISRHNSYNHN I L FFLT P HR HRVSA I NN 
. VGITVI I I TYCFFLLHTG 1ECLLL IT 

1 1 1 ■ i ■ I 1 1 ■ 1 1 1 1 1 H 

LYAYCNYDYCVTKE . EVCAYLTQ. YS 
LLCLL. L . LMSNKRVGCLCLTDAI L L 
YTPHVT I I MVYQKK KSWVPMSHRSN I V 

Rsal 

TATGCTCAAAAATTGTGTACCTTTAGCTTTTTAATTTGTAAAGGGGTTAATAAGGAATATTTGATGTATAGTGCCTTGAC 

, j , 1 1 — 1 1 i 1 1 ' i 1 * ' §■ 720 

ATACGAGTTTTTAACACATGGAAATCGAAAAATTAAACATTTCCCCAATTATTCCTTATAAACTACATATCACGGAACTG 

LCSKIVYL. LFNL. RG. . GIFDV. CLD 
YAGKLCTFSFL ICKGVNKEYLtlYSALT 
tlLKNCVPLAF. FVKGLIRNI. CIVP. 

i 1 f- 1 . i 1 i 1 • 1 ' — i 1 1 H 

HEFITYR. SKLKYLP. YPINSTYHRS 
. A. FNHVKLKK 1 OLPTLLSYK I YLAKV 
ISLFQTGKAK. NTFPN 1LF I0H1 TGQS 
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3saB ! Pra I 

TAGAGATCATAATCAGCCATACCACATTTGTAGAGGTTTTACTTGCTTTAAAAAACCTCCCACACCTCCCCCTGAACCTG 
( — i 1 1 1 ■ I 1 1 ' I 1 1 1 \ 1 f 1 »- 800 

ATCTCTAGTATTAGTCGGTATGGTGTAAACATCTCCAAAATGAACGAAATTTTTTGGAGGGTGTGGAGGGGGACTTGGAC 

R S . SAIPHL. RFYLL. KTSHTSP. T 
RDHNQPYH ICRGFTCFKKPPTPPPEP 
LE I I'l SHTT FVEVLLALKNLPHLPLNL 

1 H 1 1 1 1 ' i ' 1 1 1 1 1 h 

LDYDAMGCKYLN. KS. FVEWVEGQVQ 
LS. L GYWMQLPKVQKLFGGVGGGSGS 
SIM I LWVVNTSTKSAKFFRGCRGRFR 

Hinc II 

: Mfel : Hpa I 

AAACATAAAATGAATGCAATTGTTGTTGTTAACTTGTTTATTGCAGCTTATAATGGTTACAAATAAAGCAATAGCATCAC 

1 1 1 1 1 1 ' 1 ' ■ 1 1 1 " 1 ■ I i I 880 

TTTGTATTTTACTTACGTTAACAACAACAATTGAACAAATAACGTCGAATATTACCAATGTTTATTTCGTTATCGTAGTG 

NIK. MQLLLLTC LLOL IMVTNKAI AS 
ET. NECNCCC. LVYCSL. WLQIKQ. HH 
KHKKNA I VVVNLF I AAYNGYK. SNSIT 

1 j i j 1 I f 1 1 1 1 1 1 I 1 h 

FMFH I CNNNNVQKNCS I ITVFLA IADC 
VYFSHLOQO. ST. QLKYHNCIFCYC. 
FCLI FA ITTTLKNI AA. LP. LYLLLMV 

Xba I 

AAATTTCACAAATAAAGCATTTTTTTCACTGCATTCTAGTTGTGGTTTGTCCAAACTCATCAATGTATCTTA7CATGTCT 
1 | t ■ ■ i j ■ i 1 ) ■ ■ H 1 ■ 1 1 1 1 f- 960 

TTTAAAGTGTTTATTTCGTAAAAAAAGTGACGTAAGATCAACACCAAACAGGTTTGAGTAGTTACATAGAATAGTACAGA 

QISQIKHFFHCILVVVCPNSSMYLIMS 
KFHK. SIFFTAF. LWFVQTHQC ILSCL 
NFTNKAFFS LHSSCGLSKL I NVSYHV 
1 ! 1 1 1 1 1 1 1 1 1 i 1 ! 1 (- 

IECIFCKK. QflRTTT 0* v G F E D I Y R I M D 
LN. LYLMKKVAN. NHNTWV. . HJKDHR 
FKVFLANKESCELOPKDLSMLTD. T. 

Sphl 

i 1 ! 

AGATCTTGTGGAATGTGTGTCAGTTAGGGTGTGGAAAGTCCCCAGGCTCCCCAGCAGGCAGAAGTATGCAAAGCATGCAT 

1 1 I 1 ' * ' 1 i ■ ■ ■ l i ■ ■ ] 1 1 1 ■ ■ . i 1 | 1040 

TCTAGAACACCTTACACACAGTCAATCCCACACCTTTCAGGGGTCCGAGGGGTCGTCCGTCTTCATACGTTTCGTACGTA 

RSCGMCVS. GVESPQAPQOAEVCKACI 
DLVECVSVRVWKVPRLPSRQKYAKHA 
ILWNVCQLGCGKSPGSPAGRSMOSMH 
1 1 1 1 1 — I 1 1 1 1 * 1 i 1 1 h 

LDQPIHTL. PTSLGWAGWCASTHLAHM 
SRTSHTDTLTHFTGLSGLLCFYAFCAD 
IKHFTH. NPHPFDGPEGAPLL I CLMC 
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.Sph I 
JSisi I 

CTCAATTAGTCAGCAACCAGGTGTGGAAAGTCCCCAGGCTCCCCAGCAGGCAGAAGTATGCAAAGCATGCATCTCAATTA 



GAGTTAATCAGTCGTTGGTCCACACCTTTCAGGGGTCCGAGGGGTCGTCCGTCTTCATACGTTTCGTACGTAGAGTTAAT 



-+ 1120 



SISOQPGVESPQAPOQAEVCKACI SI 
SQLVSNOVWKVPRLPSRQKYAKHASQL 
L N SATRCGKSPGSPAGRSMOSMHLN. 



-+- 



-+- 



■+- 



-+■ 



E IL.-CGPTSLGWAGWCASTHLAHMEIL 
NTL LWTHFTGLSGLLCF YAFCAD. N 
R L DAVLHPFOGPEGAPLL I CLKCRL. 



Nco I 
.Sty ! 



GTCAGCAACCATAGTCCCGCCCCTAACTCCGCCCATCCCGCCCCTAACTCCGCCCAGTTCCGCCCATTCTCCGCCCCATG 



CAGTCGTTGGTATCAGGGCGGGGATTGAGGCGGGTAGGGCGGGGATTGAGGCGGGTCAAGGCGGGTAAGAGGCGGGGTAC 



■+ 1200 



SQOP. SRP. LRPSRP. LRPVPPILRPM 
VSNHSPAPNSAHPAPNSAOFRP FSAPW 
S ATI VPPLTPPIPPLTPPSSAHSPPH 



-+- 



-+- 



■+- 



CGYORG. SRGORG. SRG7GGMRRGM 
TLLWL G AGLEAWGAGLEAWNRGNEAGH 
DAVMTGGRVGGMGGRVGGLEAWEGGWP 



Hae III Hae III 



Bgl! 



Hae 111 



GCTGACTAATTTTTTTTATTTATGCAGAGGCCGAGGCCGCCTCGGCCTCTGAGCTATTCCAGAAGTAGTGAGGAGGCTTT 



■+ 1280 
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ACGCCGCTCGCCATAGTCGAGTGAGTTTCCGCCATTATGCCAATAGGTGTCTTAGTCCCCTATTGCGTCCTTTCTTGTAC 1680 

L r % \ \ V v S « A . H . S K A V 1 R L S T E S G D N A G K N H 
C G ERYQLTQRR. YGYPQNQG 1 TQERTC 
A A S . G ISSLKGGNTVI HRIRG. RRKEH 

1 * " 1 1 ' • 1 ' 1 1 I ' 1 1 1 i 1 1 1- 

R R ATOA. EFATIRNDVSDPSLAPFFM 
Qp SRY. SV. LRYYP. GCF. PIVCSLVH 
AAI -PILESLPPLVTIWLILPYRLFSCT 



FIGURE 3F 



BNSDOCID: <WO 0022129A1. L> 



WO 00/22129 



8/19 



PCT/US99/23928 



: Hae III : Hae III : Hae III 

TGAGCAAAAGGCCAGCAAAAGGCCAGGAACCGTAAAAAGGCCGCGTTGCTGGCGTTTTTCCATAGGCTCCGCCCCCCTGA 

i 1 1 1 1 1 1 1 1 i 1 1 1 1 1 J- 1760 
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GGTGCGAGTGGCCGAGGTCTAAATAGTCGTTATTTGGTCGGTCGGCCTTCCCGGCTCGCGTCTTCACCAGGACGTTGAAA 

PRSPAPDLSA I NQPAGRAERRSGPATL 
HAHRLG I YQQ. TSGPEGPSAEVVLQL 
PTLTGSRF I SNKPASRKGRAQKWSCNF 

1 ' 1 1 1 1 ' ' 1 1 1 1 1 ■ ■ ■ t l 

GREGAGSK DA IFWGAPLASRLLPGAVK 
WA. RSWI. CYVLWGSPGLASTTRCS. 
VSVPELN I LLLGALRFPRACFHDQLK 

Asel JMci I Fsp I 

ATCCGCCTCCATCCAGTCTATTAATTGTTGCCGGGAAGCTAGAGTAAGTAGTTCGCCAGTTAATAGTTTGCGCAACGTTG 

' i ■ ■ 1 1 i 1 1 1 1 1 1 i 1 1 1 H 2800 

TAGGCGGAGGTAGGTCAGATAATTAACAACGGCCCTTCGATCTCATTCATCAAGCGGTCAATTATCAAACGCGTTGCAAC 
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+- 3040 



AVLSLMVMAALHNSLTVMPSVRCFSV 
PQCYHSWLWQHCIILLLSCHP. DAFL 
RSV I THG YGSTA. FS YCHA I R K M L F C D 



■4- 



-+- 



-+- 



-f- 



-4- 



■+ 



ATNDSfIT I AASCLERVTMGDTLHKETV 
CH. . EHNHCCOM IRKSDHWGY5AKRH 
RLTIV. P PLVAY NE. Q. AMRL ISKQS 

Rsa I 

: Sca 1 : Nci I : Hinc II 

CTGGTGAGTACTCAACCAAGTCATTCTGAGAATAGTGTATGCGGCGACCGAGTTGCTCTTGCCCGGCGTCAACACGGGAT 



-+- 



■4- 



) i i 1 1 1 i ■ i 1 i- 3120 

GACCACTCATGAGTTGGTTCAGTAAGACTCTTATCACATACGCCGCTGGCTCAACGAGAACGGGCCGCAGTTGTGCCCTA 



TGEYSTKSF. E CMRRPSCSCPASTRD 
LVSTOPSHSENSVCGDRVALARROHG I 
W. VLNGV I LR I VYAATELLLPGVNTG 
' 1 < 1 ' 1 ' 1 1 1 1 1 . j ■ ■ t 

PSYEVLDNQSYHIRRGLOEQGADVRS 
STLV. G L ESFL THPSRTARARR. CP] 

QHTSLWTMRL I TYAAVSNSKGPTLVPY 



Dra I Xmn I 

AATACCGCGCCACATAGCAGAACTTTAAAAGTGCTCATCATTGGAAAACGTTCTTCGGGGCGAAAACTCTCAAGGATCTT 



TTATGGCGCGGTGTATCGTCTTGAAATTTTCACGAGTAGTAACCTTTTGCAAGAAGCCCCGCTTTTGAGAGTTCCTAGAA 

NTAPHSRTLKVL 1 IGKRS5GRKLSR I L 
IPRHIAEL. KCSSLENVLRGENSGGS 
YRAT. ONFKSAHHWKTFFGAKTL KDL 



■+ 3200 



-+- 



' 1 * 1 1 « 1 ' 1 1 1 1 h 

LVAGCLLV KFTSMMPFREEPRFSEL IK 
1GRVMASS. FHEONSFTRRPSFE PD 
YRAVYCFKLLA. . QFVNKPAFVRLSR 



BNSDOCID: <WO 00221 29A 1.1 > 



FIGURE 31 



WO 00/22129 



12/19 



PCT/US99/23938 



ApaL I 

accgctgttgagatccagttccatgtaacccactcg"tgcacccaactgatcttcagcatcttttactttcacc AGCGTTT 

t 1 1 1 t 1 i i 1 1 1 1 ' 1 1 »- 3280 

TGGCGACAACTCTAGGTCAAGCTACATTGGGTGAGCACGTGGGTTGACTAGAAGTCGTAGAAAATGAAAGTGGTCGCAAA 



PLLRSSSh. PTRAPN. SSASFTFTSV 
YRC. DPVRCNPL VHPTDLQHLLLSPAF 
TAVE 10FDVTHSCTQL I F S IFYFHQRF 

, 1 1 1 1 1 i ' i ' 1 ' 1 1 »- 

GSNLDLE I YGVRAGLQDEADKVK VLTE 
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IDVNGWT ! YGKLPTWQY IKC I ICGVRP 
LTSMGGLFTVNCPLGSTSS VSYAKYA 
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SEQUENCE LISTING 



(1) GENERAL INFORMATION : 



Ci) APPLICANT: Behan, Dominic P. 

Chalmers, Derek T. 
Liaw, Chen W . 



(ii) TITLE OF INVENTION: Non- Endogenous , Constitutively 

Activated Human G Protein- Coupled 
Orphan Receptors 
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(iii) NUMBER OF SEQUENCES: 280 



(iv) CORRESPONDENCE ADDRESS : 

(A) ADDRESSEE: Arena Pharmaceuticals, Inc. 

(B) STREET: 616 6 Nancy Ridge Drive 

(C) CITY: San Diego 
15 (D) STATE: CA 

(E) COUNTRY: USA 

(F) ZIP : 92122 



(v) COMPUTER READABLE FORM: 

(A) MEDIUM TYPE: Floppy disk 

20 (B) COMPUTER: IBM PC compatible 

(C) OPERATING SYSTEM: PC-DOS/MS -DOS 

(D) SOFTWARE: Patent In Release #1.0, Version #1.3 0 



(vi) CURRENT APPLICATION DATA: 
(A) APPLICATION NUMBER: US 
25 <B) FILING DATE: 

(C) CLASSIFICATION: 



(viii) ATTORNEY /AGENT INFORMATION: 

(A) NAME; Burgoon, Richard P. 

<B) REGISTRATION NUMBER: 34,78 7 

30 (ix) TELECOMMUNICATION INFORMATION: 

(A) TELEPHONE: (619)453-7200 

(B) TELEFAX: (619)4 53-7210 



(2) INFORMATION FOR SEQ ID NO : 1 : 



(i) SEQUENCE CHARACTERISTICS: 
35 (A) LENGTH: 106 8 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : single 

( D ) TOPOLOGY : 1 inear 

(ii) MOLECULE TYPE: DNA (genomic) 

40 (xi) SEQUENCE DESCRIPTION: SEQ ID NO : 1 : 



ATGGAAGATT TGGAGGAAAC ATTATTTGAA GAATTTGAAA ACTATTCCTA TGACCTAGAC 6 0 
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TATTACTCTC TGGAGTCTGA TTTGGAGGAG AAAGTCCAGC TGGGAGTTGT TCACTGGGTC 12 0 

TCCCTGGTGT TATATTGTTT GGCTTTTGTT CTGGGAATTC CAGG AAATG C CATCGTCATT 18 0 

TGGTTCACGG GGCTCAAGTG GAAGAAGACA GTCACCACTC TGTGGTTCCT CAATCTAGCC 24 0 

ATTGCGGATT TCATTTTTCT TCTCTTTCTG CCCCTGTACA TCTCCTATGT GGCCATGAAT 3 00 

5 TTCCACTGGC CCTTTGGCAT CTGGCTGTGC AAAGCCAATT CCTTCACTGC C C AG TTGAAC 36 0 

ATGTTTGCCA GTGTTTTTTT CCTGACAGTG ATCAGCCTGG ACCACTATAT CCACTTGATC 42 0 

CATCCTGTCT TATCTCATCG GCATCGAACC CTCAAGAACT CTCTGATTGT C ATT AT ATT C 48 0 

ATCTGG CTTT TGGCTTCTCT AATTGGCGGT CCTGCCCTGT ACTTCCGGGA CACTGTGGAG 540 

TTCAATAATC ATACTCTTTG CTATAACAAT TTT CAGAAGC ATGATCCTGA CCTCACTTTG 6 00 

10 ATCAGGCACC ATGTTCTGAC TTGGGTGAAA TTTATCATTG GCTATCTCTT CCCTTTGCTA 660 

ACAATGAGTA TTTGCTACTT GTGTCTCATC TTCAAGGTGA AGAAGCGAAC AGTCCTGATC 72 0 

TCCAGTAGGC ATTTCTGGAC AATTCTGGTT GTGGTTGTGG CCTTTGTGGT TTGCTGGACT 78 0 

CCTTATCACC TGTTT AG CAT TTGGGAGCTC ACCATTCACC ACAATAGCTA TTCCCACCAT 840 

GTGATGCAGG CTGGAATCCC CCTCTCCACT GGTTTGGCAT TCCTCAATAG TTGCTTGAAC 900 

15 CCCATCCTTT ATGTCCTAAT TAGTAAGAAG TTCCAAGCTC GCTTCCGGTC CTCAGTTGCT 96 0 

GAGATACTCA AGTACACACT GTGGGAAGTC AGCTGTTCTG GCACAGTGAG TGAACAGCTC 102 0 

AGGAACT CAG AAACCAAGAA TCTGTGTCTC CTGGAAACAG CTCAATAA 106 8 
(3) INFORMATION FOR SEQ ID NO : 2 : 

(i) SEQUENCE CHARACTERISTICS : 
20 (A) LENGTH: 355 amino acids 

(B) TYPE: amino acid 

(C) STRANDEDNESS : 

(D) TOPOLOGY: not relevant 

(ii) MOLECULE TYPE: protein 

25 (xi) SEQUENCE DESCRIPTION: SEQ ID NO : 2 : 

Met Glu Asp Leu Glu Glu Thr Leu Phe Glu Glu Phe Glu Asn Tyr Ser 
15 10 15 

Tyr Asp Leu Asp Tyr Tyr Ser Leu Glu Ser Asp Leu Glu Glu Lys Val 
20 25 30 



30 



Gin Leu Gly Val Val His Trp Val Ser Leu Val Leu Tyr Cys Leu Ala 
35 40 45 
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Phe Val Leu Gly lie Pro Gly Asn Ala lie Val lie Trp Phe Thr Gly 
50 55 60 

Leu Lys Trp Lys Lys Thr Val Thr Thr Leu Trp Phe Leu Asn Leu Ala 
65 70 75 80 

lie Ala Asp Phe lie Phe Leu Leu Phe Leu Pro Leu Tyr lie Ser Tyr 
85 90 95 

Val Ala Met Asn Phe His Trp Pro Phe Gly lie Trp Leu Cys Lys Ala 
100 105 110 

Asn Ser Phe Thr Ala Gin Leu Asn Met Phe Ala Ser Val Phe Phe Leu 
115 120 125 

Thr Val lie Ser Leu Asp His Tyr lie His Leu lie His Pro Val Leu 
130 135 140 

Ser His Arg His Arg Thr Leu Lys Asn Ser Leu lie Val lie lie Phe 
145 150 155 160 

lie Trp Leu Leu Ala Ser Leu lie Gly Gly Pro Ala Leu Tyr Phe Arg 
165 170 175 

Asp Thr Val Glu Phe Asn Asn His Thr Leu Cys Tyr Asn Asn Phe Gin 
180 185 190 

Lys His Asp Pro Asp Leu Thr Leu lie Arg His His Val Leu Thr Trp 
195 200 205 

Val Lys Phe lie lie Gly Tyr Leu Phe Pro Leu Leu Thr Met Ser lie 
210 215 220 

Cys Tyr Leu Cys Leu lie Phe Lys Val Lys Lys Arg Thr Val Leu lie 
225 230 235 240 

Ser Ser Arg His Phe Trp Thr lie Leu Val Val Val Val Ala Phe Val 
245 250 255 

Val Cys Trp Thr Pro Tyr His Leu Phe Ser lie Trp Glu Leu Thr lie 
260 265 270 

His His Asn Ser Tyr Ser His His Val Met Gin Ala Gly lie Pro Leu 
275 280 285 

Ser Thr Gly Leu Ala Phe Leu Asn Ser Cys Leu Asn Pro lie Leu Tyr 
290 295 300 

Val Leu lie Ser Lys Lys Phe Gin Ala Arg Phe Arg Ser Ser Val Ala 
305 310 315 320 

Glu lie Leu Lys Tyr Thr Leu Trp Glu Val Ser Cys Ser Gly Thr Val 
325 330 335 



Ser Glu Gin Leu Arg Asn Ser Glu Thr Lys Asn Leu Cys Leu Leu Glu 
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340 345 350 

Thr Ala Gin 

355 

(4) INFORMATION FOR SEQ ID NO : 3 : 

5 (i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 1089 base pairs 

(B) TYPE; nucleic acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 

10 (ii) MOLECULE TYPE: DNA (genomic) 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO : 3 : 

ATGGGCAACC ACACG TGGG A GGGCTGCCAC GTGGACTCGC GCGTGGACCA CCTCTTTCCG 6 0 

CCATCCCTCT ACATCTTTGT CATCGGCGTG GGGCTGCCCA CCAACTGCCT GGCTCTGTGG 12 0 

GCGGCCTACC GCCAGGTGCA ACAGCGCAAC GAGCTGGGCG TCTACCTGAT GAACCTCAGC 18 0 

15 ATCGCCGACC TGCTGTACAT CTGCACGCTG CCGCTGTGGG TGGACTACTT CCTGCACCAC 24 0 

GACAACTGGA TCCACGGCCC CGGGTCCTGC AAGCTCTTTG GGTTCATCTT CTACACCAAT 300 

ATCTACATCA GCATCGCCTT CCTGTGCTGC ATCTCGGTGG ACCGCTACCT GGCTGTGGCC 360 

CACCCACTCC GCTTCGCCCG CCTGCGCCGC GTCAAGACCG CCGTGGCCGT GAGCTCCGTG 42 0 

GTCTGGGCCA CGGAGCTGGG CGCCAACTCG GCGCCCCTGT TCCATGACGA GCTCTTCCGA 4 80 

20 GACCGCTACA ACCACACCTT CTG CTTTG AG AAGTTCCCCA TGGAAGGCTG GGTGGCCTGG 54 0 

ATGAACCTCT ATCGGGTGTT CGTGGGCTTC CTCTTCCCGT GGGCGCTCAT GCTGCTGTCG 6 00 

TACCGGGGCA TCCTGCGGGC CGTGCGGGGC AGCGTGTCCA CCGAGCGCCA GGAGAAGGCC 66 0 

AAGATCAAGC GGCTGGCCCT CAGCCTCATC GCCATCGTGC TGGTCTGCTT TGCGCCCTAT 72 0 

CACGTGCTCT TGCTGTCCCG CAGCGCCATC TACCTGGGCC GCCCCTGGGA CTGCGGCTTC 78 0 

25 GAGGAGCGCG TCTTTTCTGC ATACCACAGC TCACTGGCTT TCACCAGCCT CAACTGTGTG 84 0 

GCGGACCCCA TCCTCTACTG CCTGGTCAAC GAGGGCGCCC GCAGCGATGT GGCCAAGGCC 900 

CTGCACAACC TGCTCCGCTT TCTGGCCAGC GACAAGCCCC AGGAGATGGC CAATGCCTCG 960 

CTCACCCTGG AGACCCCACT CACCTCCAAG AGGAACAGCA CAGCCAAAGC CATGACTGGC 102 0 

AGCTGGGCGG CCACTCCGCC TTCCCAGGGG GACCAGGTGC AGCTGAAGAT GCTGCCGCCA 108 0 

3 0 GCACAATGA 108 9 
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(5) INFORMATION FOR SEQ ID NO : 4 : 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 3 62 amino acids 

(B) TYPE: amino acid 

(C) STRANDEDNESS : 

(D) TOPOLOGY: not relevant 

(ii) MOLECULE TYPE: protein 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO : 4 : 

Met Gly Asn His Thr Trp Glu Gly Cys His Val Asp Ser Arg Val Asp 
1 5 10 15 

His Leu Phe Pro Pro Ser Leu Tyr lie Phe Val lie Gly Val Gly Leu 
20 25 30 

Pro Thr Asn Cys Leu Ala Leu Trp Ala Ala Tyr Arg Gin Val Gin Gin 
35 40 45 

Arg Asn Glu Leu Gly Val Tyr Leu Met Asn Leu Ser lie Ala Asp Leu 
50 55 60 

Leu Tyr lie Cys Thr Leu Pro Leu Trp Val Asp Tyr Phe Leu His His 
65 70 75 80 

Asp Asn Trp lie His Gly Pro Gly Ser Cys Lys Leu Phe Gly Phe lie 
85 90 95 

Phe Tyr Thr Asn lie Tyr lie Ser lie Ala Phe Leu Cys Cys lie Ser 
100 105 110 

Val Asp Arg Tyr Leu Ala Val Ala His Pro Leu Arg Phe Ala Arg Leu 
115 120 125 

Arg Arg Val Lys Thr Ala Val Ala Val Ser Ser Val Val Trp Ala Thr 
130 135 140 

Glu Leu Gly Ala Asn Ser Ala Pro Leu Phe His Asp Glu Leu Phe Arg 
145 150 155 160 

Asp Arg Tyr Asn His Thr Phe Cys Phe Glu Lys Phe Pro Met Glu Gly 
165 170 175 

Trp Val Ala Trp Met Asn Leu Tyr Arg Val Phe Val Gly Phe Leu Phe 
180 185 190 

Pro Trp Ala Leu Met Leu Leu Ser Tyr Arg Gly lie Leu Arg Ala Val 
195 200 205 

Arg . Gly Ser Val Ser Thr Glu Arg Gin Glu Lys Ala Lys lie Lys Arg 
210 215 220 



Leu Ala Leu Ser Leu lie Ala lie Val Leu Val Cys Phe Ala Pro Tyr 
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225 230 235 240 

His Val Leu Leu Leu Ser Arg Ser Ala lie Tyr Leu Gly Arg Pro Trp 
245 250 255 

Asp Cys Gly Phe Glu Glu Arg Val Phe Ser Ala Tyr His Ser Ser Leu 
5 260 265 270 

Ala Phe Thr Ser Leu Asn Cys Val Ala Asp Pro lie Leu Tyr Cys Leu 
275 280 285 

Val Asn Glu Gly Ala Arg Ser Asp Val Ala Lys Ala Leu His Asn Leu 
290 295 300 

10 Leu Arg Phe Leu Ala Ser Asp Lys Pro Gin Glu Met Ala Asn Ala Ser 

305 310 315 320 

Leu Thr Leu Glu Thr Pro Leu Thr Ser Lys Arg Asn Ser Thr Ala Lys 
325 330 335 

Ala Met Thr Gly Ser Trp Ala Ala Thr Pro Pro Ser Gin Gly Asp Gin 
15 340 345 350 

Val Gin Leu Lys Met Leu Pro Pro Ala Gin 
355 360 

(6) INFORMATION FOR SEQ ID NO : 5 : 

(i) SEQUENCE CHARACTERISTICS: 
20 (A) LENGTH: 3 0 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 

25 <xi) SEQUENCE DESCRIPTION: SEQ ID NO : 5 : 

TATGAATTCA GATGCTCTAA ACGTCCCTGC 3 0 

(7) INFORMATION FOR SEQ ID NO : 6 : 

(i) SEQUENCE CHARACTERISTICS: 
(A) LENGTH: 3 0 base pairs 

30 (B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO : 6 : 
35 TCCGGATCCA CCTGCACCTG CGCCTGCACC 30 

(8) INFORMATION FOR SEQ ID NO: 7: 
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(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 1002 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : single 



(ii) MOLECULE TYPE: DNA (genomic) 
(xi) SEQUENCE DESCRIPTION: SEQ ID NO : 7 : 

ATGGAGTCCT CAGGCAACCC AGAGAGCACC ACCTTTTTTT ACTATGACCT TCAGAGCCAG 60 

CCGTGTGAGA ACCAGGCCTG GGTCTTTGCT ACCCTCGCCA CCACTGTCCT GTACTGCCTG 12 0 

10 GTGTTTCTCC TCAGCCTAGT GGGCAACAGC CTGGTCCTGT GGGTCCTGGT GAAGTATGAG 18 0 

AGCCTGGAGT CCCTCACCAA CATCTTCATC CTCAACCTGT GCCTCTCAGA CCTGGTGTTC 24 0 

GCCTGCTTGT TGCCTGTGTG GATCTCCCCA TACCACTGGG GCTGGGTGCT GGGAGACTTC 3 00 

CTCTGCAAAC TCCTCAATAT GATCTTCTCC ATCAGCCTCT AC AG C AGC AT CTTCTTCCTG 36 0 

ACCATCATGA CCATCCACCG CTACCTGTCG GTAGTGAGCC CCCTCTCCAC CCTGCGCGTC 42 0 

15 CCCACCCTCC GCTGCCGGGT GCTGGTGACC ATGGCTGTGT GGGTAGCCAG CATCCTGTCC 4 80 

TCCATCCTCG AC AC C ATCTT CCACAAGGTG CTTTCTTCGG GCTGTGATTA TTCCGAACTC 54 0 

ACGTGGTACC TCACCTCCGT CTACCAGCAC AACCTCTTCT TCCTGCTGTC CCTGGGGATT 600 

ATCCTGTTCT GCTACGTGGA GATCCTCAGG ACCCTGTTCC GCTCACGCTC CAAGCGGCGC 660 

CACCGCACGG TCAAGCTCAT CTTCGCCATC GTGGTGGCCT ACTTCCTCAG CTGGGGTCCC 72 0 

20 TACAACTTCA CCCTGTTTCT GCAGACGCTG TTTCGGACCC AGATCATCCG GAGCTGCGAG 780 

GCCAAACAGC AG CTAGAATA CGCCCTGCTC ATCTGCCGCA ACCTCGCCTT CTCCCACTGC 84 0 

TGCTTTAACC CGGTGCTCTA TGTCTTCGTG GGGGT CAAGT TCCGCACACA CCTGAAACAT 900 

GTTCTCCGGC AGTTCTGGTT CTGCCGGCTG CAGGCACCCA GCCCAGCCTC GATCCCCCAC 96 0 

TCCCCTGGTG CCTTCGCCTA TGAGGGCGCC TCCTTCTACT GA 1002 

25 (9) INFORMATION FOR SEQ ID NO : 8 : 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 33 3 amino acids 

(B) TYPE : amino acid 

( C ) STRANDEDNESS : 



(ii) MOLECULE TYPE: protein 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO : 8 : 
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Met Glu Ser Ser Gly Asn Pro Glu Ser Thr Thr Phe Phe Tyr Tyr Asp 
15 10 15 

Leu Gin Ser Gin Pro Cys Glu Asn Gin Ala Trp Val Phe Ala Thr Lou 
20 25 30 

5 Ala Thr Thr Val Leu Tyr Cys Leu Val Phe Leu Leu Ser Leu Val Gly 

35 40 45 

Asn Ser Leu Val Leu Trp Val Leu Val Lys Tyr Glu Ser Leu Glu Ser 
50 55 60 

Leu Thr Asn He Phe He Leu Asn Leu Cys Leu Ser Asp Leu Val Phe 
10 65 70 75 80 

Ala Cys Leu Leu Pro Val Trp He Ser Pro Tyr His Trp Gly Trp Val 
85 90 95 

Leu Gly Asp Phe Leu Cys Lys Leu Leu Asn Met He Phe Ser He Ser 
100 105 110 

15 Leu Tyr Ser Ser He Phe Phe Leu Thr He Met Thr He His Arg Tyr 

115 120 125 

Leu Ser Val Val Ser Pro Leu Ser Thr Leu Arg Val Pro Thr Leu Arg 
130 135 140 

Cys Arg Val Leu Val Thr Met Ala Val Trp Val Ala Ser He Leu Ser 
20 145 150 155 160 

Ser He Leu Asp Thr He Phe His Lys Val Leu Ser Ser Gly Cys Asp 
165 170 175 

Tyr Ser Glu Leu Thr Trp Tyr Leu Thr Ser Val Tyr Gin His Asn Leu 
180 185 190 

25 Phe Phe Leu Leu Ser Leu Gly He He Leu Phe Cys Tyr Val Glu He 

135 200 205 

Leu Arg Thr Leu Phe Arg Ser Arg Ser Lys Arg Arg His Arg Thr Val 
210 215 220 

Lys Leu He Phe Ala He Val Val Ala Tyr Phe Leu Ser Trp Gly Pro 
30 225 230 235 240 

Tyr Asn Phe Thr Leu Phe Leu Gin Thr Leu Phe Arg Thr Gin He He 
245 250 255 

Arg Ser Cys Glu Ala Lys Gin Gin Leu Glu Tyr Ala Leu Leu He Cys 
260 265 270 



35 



Arg Asn Leu Ala Phe Ser His Cys Cys Phe Asn Pro Val Leu Tyr Val 
275 280 285 



BNSDOCID: <WO_ 0022129A1J > 



"1 



WO 00/22129 PCT/US99/23938 



Phe Val Gly Val Lys Phe Arg Thr His Leu Lys His Val Leu Arg Gin 
290 295 300 

Phe Trp Phe Cys Arg Leu Gin Ala Pro Ser Pro Ala Ser lie Pro His 
305 310 315 320 

5 Ser Pro Gly Ala Phe Ala Tyr Glu Gly Ala Ser Phe Tyr 

325 330 

(10) INFORMATION FOR SEQ ID NO : 9 : 

(i) SEQUENCE CHARACTERISTICS: 
(A) LENGTH: 30 base pairs 

10 (B) TYPE: nucleic acid 

<C) STRANDEDNESS : single 
<D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO : 9 : 
15 GCAAGCTTGG GGGACGCCAG GTCGCCGGCT 3 0 

(11) INFORMATION FOR SEQ ID NO: 10: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 31 base pairs 

(B) TYPE: nucleic acid 
20 (C) STRANDEDNESS: single 

(D) TOPOLOGY : linear 

(ii) MOLECULE TYPE: DNA (genomic) 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 10: 
GCGGATCCGG ACGCTGGGGG AGTCAGGCTG C 31 
25 (12) INFORMATION FOR SEQ ID NO: 11: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 98 7 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 
30 (D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO:ll: 

ATGGACAACG CCTCGTTCTC GGAGCCCTGG CCCGCCAACG CATCGGGCCC GGACCCGGCG 6 0 

CTGAGCTGCT CCAACGCGTC GACTCTGGCG CCGCTGCCGG CGCCGCTGGC GGTGGCTGTA 12 0 

35 CCAGTTGTCT ACGCGGTGAT CTGCGCCGTG GGTCTGGCGG GCAACTCCGC CGTGCTGTAC 180 
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GTGTTGCTGC GGGCGCCCCG C ATG AAG AC C GTCACCAACC TGTTCATCCT CAACCTGGCC 24 0 

ATCGCCGACG AGCTCTTCAC GCTGGTGCTG CCCATCAACA TCGCCGACTT CCTGCTGCGG 3 00 

CAGTGGCCCT TCGGGGAGCT CATGTGCAAG CTCATCGTGG CTATCGACCA GTACAACACC 3 60 

TTCTCCAGCC TCTACTTCCT CACCGTCATG AGCGCCGACC GCTACCTGGT GGTGTTGGCC 42 0 

5 ACTGCGGAGT CGCGCCGGGT GGCCGGCCGC ACCTACAGCG CCGCGCGCGC GGTGAGCCTG 48 0 

GCCGTGTGGG GGATCGTCAC ACTCGTCGTG CTGCCCTTCG CAGTCTTCGC CCGGCTAGAC 54 0 

GACGAGCAGG GCCGGCGCCA GTGCGTGCTA GTCTTTCCGC AGCCCGAGGC CTTCTGGTGG 6 00 

CGCGCGAGCC GCCTCTACAC GCTCGTGCTG GGCTTCGCCA TCCCCGTGTC CACCATCTGT 66 0 

GTCCTCTATA CCACCCTGCT GTGCCGGCTG CATGCCATGC GGCTGGACAG CCACGCCAAG 72 0 

10 GCCCTGGAGC GCGCCAAGAA GCGGGTGACC TTCCTGGTGG TGGCAATCCT GGCGGTGTGC 78 0 

CTCCTCTGCT GGACGCCCTA CCACCTGAGC ACCGTGGTGG CGCTCACCAC CGACCTCCCG 84 0 

CAGACGCCGC TGGTCATCGC TATCTCCTAC TTCATCACCA GCCTGACGTA CGCCAACAGC 900 

TGCCTCAACC CCTTCCTCTA CGCCTTCCTG GACGCCAGCT TCCGCAGGAA CCTCCGCCAG 96 0 

CTGATAACTT GCCGCGCGGC AGCCTGA 98 7 

15 (13) INFORMATION FOR SEQ ID NO: 12: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 32 8 amino acids 

(B) TYPE: amino acid 

(C) STRANDEDNESS : 

20 (D) TOPOLOGY: not relevant 

(ii) MOLECULE TYPE: protein 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 12: 

Met Asp Asn Ala Ser Phe Ser Glu Pro Trp Pro Ala Asn Ala Ser Gly 
15 10 15 

25 Pro Asp Pro Ala Leu Ser Cys Ser Asn Ala Ser Thr Leu Ala Pro Leu 

20 25 30 

Pro Ala Pro Leu Ala Val Ala Val Pro Val Val Tyr Ala Val lie Cys 
35 40 45 

Ala Val Gly Leu Ala Gly Asn Ser Ala Val Leu Tyr Val Leu Leu Arg 
30 50 55 60 

Ala Pro Arg Met Lys Thr Val Thr Asn Leu Phe lie Leu Asn Leu Ala 
65 70 75 80 
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lie Ala Asp Glu Leu Phe Thr Leu Val Leu Pro lie Asn lie Ala Asp 
85 90 95 

Phe Leu Leu Arg Gin Trp Pro Phe Gly Glu Leu Met Cys Lys Leu lie 
100 105 110 

5 Val Ala lie Asp Gin Tyr Asn Thr Phe Ser Ser Leu Tyr Phe Leu Thr 

115 120 125 

Val Met Ser Ala Asp Arg Tyr Leu Val Val Leu Ala Thr Ala Glu Ser 
130 135 140 

Arg Arg Val Ala Gly Arg Thr Tyr Ser Ala Ala Arg Ala Val Ser Leu 
10 145 150 155 160 

Ala Val Trp Gly lie Val Thr Leu Val Val Leu Pro Phe Ala Val Phe 
165 170 175 

Ala Arg Leu Asp Asp Glu Gin Gly Arg Arg Gin Cys Val Leu Val Phe 
180 185 190 

15 Pro Gin Pro Glu Ala Phe Trp Trp Arg Ala Ser Arg Leu Tyr Thr Leu 

195 200 205 

Val Leu Gly Phe Ala lie Pro Val Ser Thr lie Cys Val Leu Tyr Thr 
210 215 220 

Thr Leu Leu Cys Arg Leu His Ala Met Arg Leu Asp Ser His Ala Lys 
20 225 230 235 240 

Ala Leu Glu Arg Ala Lys Lys Arg Val Thr Phe Leu Val Val Ala lie 
245 250 255 

Leu Ala Val Cys Leu Leu Cys Trp Thr Pro Tyr His Leu Ser Thr Val 
260 265 270 

25 Val Ala Leu Thr Thr Asp Leu Pro Gin Thr Pro Leu Val lie Ala lie 

275 280 285 

Ser Tyr Phe lie Thr Ser Leu Thr Tyr Ala Asn Ser Cys Leu Asn Pro 
290 295 300 

Phe Leu Tyr Ala Phe Leu Asp Ala Ser Phe Arg Arg Asn Leu Arg Gin 
30 305 310 315 320 

Leu lie Thr Cys Arg Ala Ala Ala 
325 

(14) INFORMATION FOR SEQ ID NO: 13: 

(i) SEQUENCE CHARACTERISTICS: 
35 (A) LENGTH: 30 base pairs 

<B) TYPE: nucleic acid 

(C) STRANDEDNESS : single 

( D ) TOPOLOGY : 1 inear 
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(ii) MOLECULE TYPE: DNA (genomic) 
(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 13: 
CGGAATTCGT CAACGGTCCC AG CTACAATG 3 0 

(15) INFORMATION FOR SEQ ID NO:14: 

5 (i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 31 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 

10 (ii) MOLECULE TYPE: DNA (genomic) 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO:14: 

ATGGATCCCA GGCCCTTCAG CACCGCAATA T 31 

(16) INFORMATION FOR SEQ ID NO: 15; 

(i) SEQUENCE CHARACTERISTICS: 
15 (A) LENGTH: 1002 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 

20 (xi) SEQUENCE DESCRIPTION: SEQ ID NO: 15: 

ATGCAGG CCG CTGGGCACCC AGAGCCCCTT GACAGCAGGG GCTCCTTCTC CCTCCCCACG 6 0 

ATGGGTGCCA ACGTCTCTCA GGACAATGGC ACTGGCCACA ATGCCACCTT CTCCGAGCCA 12 0 

CTGCCGTTCC TCTATGTGCT CCTGCCCGCC GTGTACTCCG GGATCTGTGC TGTGGGGCTG 18 0 

ACTGGCAACA CGGCCGTCAT CCTTGTAATC CTAAGGGCGC CCAAGATGAA GACGGTGACC 24 0 

25 AACGTGTTCA TCCTGAACCT GGCCGTCGCC GACGGG CTCT TCACGCTGGT ACTGCCCGTC 3 00 

AACATCGCGG AGCACCTGCT GCAGTACTGG CCCTTCGGGG AGCTGCTCTG CAAGCTGGTG 36 0 

CTGGCCGTCG ACCACTACAA CATCTTCTCC AGCATCTACT TCCTAGCCGT GATGAGCGTG 42 0 

GACCGATACC TGGTGGTGCT GGCCACCGTG AGGTCCCGCC ACATGCCCTG GCGCACCTAC 48 0 

CGGGGGGCGA AGGTCGCCAG CCTGTGTGTC TGGCTGGGCG TCACGGTCCT GGTTCTGCCC 54 0 

30 TTCTTCTCTT TCGCTGGCGT CTACAGCAAC GAGCTGCAGG TCCCAAGCTG TGGGCTGAGC 600 

TTCCCGTGGC CCGAGCGGGT CTGGTTCAAG GCCAGCCGTG TCTACACTTT GGTCCTGGGC 660 

TTCGTGCTGC CCGTGTGCAC CATCTGTGTG CTCTACACAG ACCTCCTGCG CAGGCTGCGG 72 0 
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GCCGTGCGGC TCCGCTCTGG AGCCAAGGCT CTAGGCAAGG CCAGGCGGAA GGTGACCGTC 780 

CTGGTCCTCG TCGTGCTGGC CGTGTGCCTC CTCTGCTGGA CGCCCTTCCA CCTGGCCTCT 84 0 

GTCGTGGCCC TGACCACGGA CCTGCCCCAG ACCCCACTGG TCATCAGTAT GTCCTACGTC 90 0 

ATCACCAGCC TCACGTACGC CAACTCGTGC CTGAACCCCT TCCTCTACGC CTTTCTAGAT 96 0 

5 GAC AACTTC C GGAAGAACTT CCGCAGCATA TTGCGGTGCT GA 1002 
(17) INFORMATION FOR SEQ ID NO:16: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 333 amino acids 

(B) TYPE: amino acid 
10 (C) STRANDEDNESS : 

(D) TOPOLOGY: not relevant 

<ii) MOLECULE TYPE: protein 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO:l6: 

Met Gin Ala Ala Gly His Pro Glu Pro Leu Asp Ser Arg Gly Ser Phe 
15 1 5 10 15 

Ser Leu Pro Thr Met Gly Ala Asn Val Ser Gin Asp Asn Gly Thr Gly 
20 25 30 

His Asn Ala Thr Phe Ser Glu Pro Leu Pro Phe Leu Tyr Val Leu Leu 
35 40 45 

20 Pro Ala Val Tyr Ser Gly lie Cys Ala Val Gly Leu Thr Gly Asn Thr 

50 55 60 

Ala Val lie Leu Val lie Leu Arg Ala Pro Lys Met Lys Thr Val Thr 
€5 70 75 80 

Asn Val Phe lie Leu Asn Leu Ala Val Ala Asp Gly Leu Phe Thr Leu 
25 85 90 95 

Val Leu Pro Val Asn lie Ala Glu His Leu Leu Gin Tyr Trp Pro Phe 
100 105 110 

Gly Glu Leu Leu Cys Lys Leu Val Leu Ala Val Asp His Tyr Asn lie 
115 120 125 

30 Phe Ser Ser lie Tyr Phe Leu Ala Val Met Ser Val Asp Arg Tyr Leu 

130 135 140 

Val Val Leu Ala Thr Val Arg Ser Arg His Met Pro Trp Arg Thr Tyr 
145 150 155 160 

Arg Gly Ala Lys Val Ala Ser Leu Cys Val Trp Leu Gly Val Thr Val. 
35 165 170 175 
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Leu Val Leu Pro Phe Phe Ser Phe Ala Gly Val Tyr Ser Asn Glu Leu 
180 185 190 

Gin Val Pro Ser Cys Gly Leu Ser Phe Pro Trp Pro Glu Arg Val Trp 
195 200 205 

5 Phe Lys Ala Ser Arg Val Tyr Thr Leu Val Leu Gly Phe Val Leu Pro 

210 215 220 

Val Cys Thr lie Cys Val Leu Tyr Thr Asp Leu Leu Arg Arg Leu Arg 
225 230 235 240 

Ala Val Arg Leu Arg Ser Gly Ala Lys Ala Leu Gly Lys Ala Arg Arg 
10 245 250 255 

Lys Val Thr Val Leu Val Leu Val Val Leu Ala Val Cys Leu Leu Cys 
260 265 270 

Trp Thr Pro Phe His Leu Ala Ser Val Val Ala Leu Thr Thr Asp Leu 
275 280 285 

15 Pro Gin Thr Pro Leu Val lie Ser Met Ser Tyr Val lie Thr Ser Leu 

290 295 300 

Thr Tyr Ala Asn Ser Cys Leu Asn Pro Phe Leu Tyr Ala Phe Leu Asp 
305 310 315 320 

Asp Asn Phe Arg Lys Asn Phe Arg Ser lie Leu Arg Cys 
20 325 330 

(18) INFORMATION FOR SEQ ID NO:17: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 4 8 base pairs 

(B) TYPE: nucleic acid 
25 (C) STRANDEDNESS : single 

(D) TOPOLOGY : linear 

(ii) MOLECULE TYPE: DNA (genomic) 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 17: 
ACGAATTCAG CCATGGTCCT TGAGGTGAGT GACCACCAAG TGCTAAAT 4 8 

30 (19) INFORMATION FOR SEQ ID NO:18: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 2 7 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 
35 (D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 18: 
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GAGGATCCTG GAATGCGGGG AAGTCAG 2 7 
(20) INFORMATION FOR SEQ ID NO: 19: 

(i) SEQUENCE CHARACTERISTICS: 
(A) LENGTH: 1107 base pairs 

5 (B) TYPE: nucleic acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 19: 

10 ATGGTCCTTG AGGTGAGTGA CCACCAAGTG CTAAATGACG CCGAGGTTGC CGCCCTCCTG 6 0 

GAGAACTTCA GCTCTTCCTA TGACTATGGA GAAAACGAGA GTGACTCGTG CTGTACCTCC 12 0 

CCGCCCTGCC CACAGGACTT CAGCCTGAAC TTCGACCGGG CCTTCCTGCC AGCCCTCTAC 180 

AGCCTCCTCT TTCTGCTGGG GCTGCTGGGC AACGGCGCGG TGGCAGCCGT GCTGCTGAGC 24 0 

CGG CGGAC AG CCCTGAGCAG CACCGACACC TTCCTGCTCC ACCTAGCTGT AG C AG ACACG 3 00 

15 CTGCTGGTGC TGACACTGCC GCTCTGGGCA GTGGACGCTG CCGTCCAGTG GGTCTTTGGC 3 60 

TCTGGCCTCT GCAAAGTGGC AGGTGCCCTC TTCAACATCA ACTTCTACGC AGGAGCCCTC 42 0 

CTGCTGGCCT GCATCAGCTT TGACCGCTAC CTGAACATAG TTCATGCCAC CCAGCTCTAC 48 0 

CGCCGGGGGC CCCCGGCCCG CGTGACCCTC ACCTGCCTGG CTGTCTGGGG GCTCTGCCTG 54 0 

CTTTTCGCCC TCCCAGACTT CATCTTCCTG TCGGCCCACC ACGACGAGCG CCTCAACGCC 6 00 

20 ACCCACTGCC AATACAACTT CCCACAGGTG GGCCGCACGG CTCTGCGGGT GCTGCAGCTG 66 0 

GTGGCTGGCT TTCTGCTGCC CCTGCTGGTC ATGGCCTACT GCTATGCCCA CATCCTGGCC 72 0 

GTGCTGCTGG TTTCCAGGGG CCAGCGGCGC CTGCGGGCCA TGCGGCTGGT GGTGGTGGTC 78 0 

GTGGTGGCCT TTGCCCTCTG CTGGACCCCC TATCACCTGG TGGTGCTGGT GGACATCCTC 84 0 

ATGGACCTGG GCGCTTTGGC CCGCAACTGT GGCCGAGAAA GCAGGGTAGA CGTGGCCAAG 900 

25 TCGGTCACCT CAGGCCTGGG CTACATGCAC TGCTGCCTCA ACCCGCTGCT CTATGCCTTT 96 0 

GTAGGGGTCA AGTTCCGGGA GCGGATGTGG ATGCTGCTCT TGCGCCTGGG CTGCCCCAAC 102 0 

CAGAGAGGGC TCCAGAGGCA GCCATCGTCT TCCCGCCGGG ATTCATCCTG GTCTGAGACC 108 0 

TCAGAGGCCT CCTACTCGGG CTTGTGA 1107 
(21) INFORMATION FOR SEQ ID NO: 20: 
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(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 368 amino acids 

(B) TYPE: amino acid 

(C) STRANDEDNESS : 

5 (D) TOPOLOGY: not relevant 



(ii) MOLECULE TYPE: protein 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 20: 



Met Val Leu Glu Val Ser Asp His Gin Val Leu Asn Asp Ala Glu Val 
15 10 15 

10 Ala Ala Leu Leu Glu Asn Phe Ser Ser Ser Tyr Asp Tyr Gly Glu Asn 

20 25 30 



Glu Ser Asp Ser Cys Cys Thr Ser Pro Pro Cys Pro Gin Asp Phe Ser 

35 40 45 

Leu Asn Phe Asp Arg Ala Phe Leu Pro Ala Leu Tyr Ser Leu Leu Phe 

15 50 55 60 



Leu Leu Gly Leu 
65 

Arg Arg Thr Ala 



20 Val Ala Asp Thr 

100 

Ala Ala Val Gin 
115 



Leu Gly Asn Gly 
70 

Leu Ser Ser Thr 
85 

Leu Leu Val Leu 



Trp Val Phe Gly 
120 



Ala Val Ala Ala 
75 

Asp Thr Phe Leu 
90 

Thr Leu Pro Leu 
105 

Ser Gly Leu Cys 



Val Leu Leu Ser 
80 

Leu His Leu Ala 
95 

Trp Ala Val Asp 
110 

Lys Val Ala Gly 
125 



Ala Leu Phe Asn lie Asn Phe Tyr Ala Gly Ala Leu Leu Leu Ala Cys 
25 130 135 140 

lie Ser Phe Asp Arg Tyr Leu Asn lie Val His Ala Thr Gin Leu Tyr 
145 150 155 160 

Arg Arg Gly Pro Pro Ala Arg Val Thr Leu Thr Cys Leu Ala Val Trp 
165 170 175 

30 Gly Leu Cys Leu Leu Phe Ala Leu Pro Asp Phe lie Phe Leu Ser Ala 

180 185 190 

His His Asp Glu Arg Leu Asn Ala Thr His Cys Gin Tyr Asn Phe Pro 
195 200 205 

Gin Val Gly Arg Thr Ala Leu Arg Val Leu Gin Leu Val Ala Gly Phe 
35 210 215 220 

Leu Leu Pro Leu Leu Val Met Ala Tyr Cys Tyr Ala His lie Leu Ala 
225 230 235 240 



BNSDOCID: <WO O022129AT t > 



WO 00/22129 



PCT/US99/23938 



17 

Val Leu Leu Val Ser Arg Gly Gin Arg Arg Leu Arg Ala Met Arg Leu 
245 250 255 

Val Val Val Val Val Val Ala Phe Ala Leu Cys Trp Thr Pro Tyr His 
260 265 270 

5 Leu Val Val Leu Val Asp lie Leu Met Asp Leu Gly Ala Leu Ala Arg 

275 280 285 

Asn Cys Gly Arg Glu Ser Arg Val Asp Val Ala Lys Ser Val Thr Ser 
290 295 300 

Gly Leu Gly Tyr Met His Cys Cys Leu Asn Pro Leu Leu Tyr Ala Phe 
10 305 310 315 320 

Val Gly Val Lys Phe Arg Glu Arg Met Trp Met Leu Leu Leu Arg Leu 
325 330 335 

Gly Cys Pro Asn Gin Arg Gly Leu Gin Arg Gin Pro Ser Ser Ser Arg 
340 345 350 

15 Arg Asp Ser Ser Trp Ser Glu Thr Ser Glu Ala Ser Tyr Ser Gly Leu 

355 360 365 

(22) INFORMATION FOR SEQ ID NO: 21: 

(i) SEQUENCE CHARACTERISTICS: 
(A) LENGTH: 3 0 base pairs 

20 (B) TYPE: nucleic acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO:21: 
25 TTAAGCTTGA CCTAATGCCA TCTTGTGTCC 3 0 

(23) INFORMATION FOR SEQ ID NO: 22: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 3 0 base pairs 

(B) TYPE: nucleic acid 
30 (C) STRANDEDNESS: single 

( D ) TOPOLOGY : 1 inear 

(ii) MOLECULE TYPE: DNA (genomic) 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO:22: 
TTGGATCCAA AAGAACCATG CACCTCAGAG 3 0 

35 (24) INFORMATION FOR SEQ ID NO: 23: 
(i) SEQUENCE CHARACTERISTICS: 
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(A) LENGTH: 1074 base pairs 

<B) TYPE: nucleic acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 



5 



(ii) MOLECULE TYPE: DNA (genomic) 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 23: 
ATGGCTGATG ACTATGGCTC TGAATCCACA TCTTCCATGG AAGACTACGT TAACTTCAAC 
TTCACTGACT TCTACTGTGA GAAAAACAAT GTCAGGCAGT TTGCGAGCCA TTTCCTCCCA 
CCCTTGTACT GGCTCGTGTT CATCGTGGGT GCCTTGGGCA ACAGTCTTGT TATCCTTGTC 

10 T ACTGGT AC T GCACAAGAGT GAAGACCATG ACCGACATGT TCCTTTTGAA TTTGGCAATT 
GCTGACCTCC TCTTTCTTGT CACTCTTCCC TTCTGGGCCA TTGCTGCTGC TGACCAGTGG 
AAGTTCCAGA CCTTCATGTG CAAGGTGGTC AACAGCATGT ACAAGATGAA CTTCTACAGC 
TGTGTGTTGC TGATCATGTG CATCAGCGTG GACAGGTACA TTGCCATTGC CCAGGCCATG 
AGAGCACATA CTTGGAGGGA GAAAAGGCTT TTGTACAGCA AAATGGTTTG CTTTACCATC 

15 TGGGTATTGG CAGCTGCTCT CTGCATCCCA GAAATCTTAT ACAGCCAAAT CAAGGAGGAA 
TCCGGCATTG CTATCTGCAC CATGGTTTAC CCTAGCGATG AGAG CACCAA ACTGAAGTCA 
GCTGTC TTG A CCCTGAAGGT CATTCTGGGG TTCTTCCTTC CCTTCGTGGT CATGGCTTGC 
TGCTATACCA TCATCATTCA CACCCTGATA CAAGCCAAGA AGTCTTCCAA GCACAAAGCC 
CTAAAAGTGA C CAT C ACTGT CCTGACCGTC TTTGTCTTGT CTCAGTTTCC CTACAACTGC 

20 ATTTTGTTGG TGCAGAC CAT TGACGCCTAT GCCATGTTCA TCTCCAACTG TGCCGTTTCC 
ACCAACATTG ACATCTGCTT CCAGGTCACC CAGACCATCG CCTTCTTCCA CAGTTGCCTG 
AACCCTGTTC TCTATGTTTT TGTGGGTGAG AGATTCCGCC GGGATCTCGT GAAAACCCTG 
AAGAACTTGG GTTGCATCAG CCAGGCCCAG TGGGTTTCAT TTACAAGGAG AGAGGGAAGC 
TTGAAGCTGT CGTCTATGTT GCTGGAGACA ACCTCAGGAG CACTCTCCCT CTGA 

25 (25) INFORMATION FOR SEQ ID NO : 24 : 



60 
120 
180 
240 
300 
360 
420 
480 
540 
600 
660 
720 
780 
840 
900 
960 
1020 
1074 



30 



(i) SEQUENCE CHARACTERISTICS : 

(A) LENGTH: 3 57 amino acids 

(B) TYPE: amino acid 

(C) STRANDEDNESS: 

(D) TOPOLOGY: not relevant 



(ii) MOLECULE TYPE: protein 
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(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 24: 

Met Ala Asp Asp Tyr Gly Ser Glu Ser Thr Ser Ser Met Glu Asp Tyr 
15 10 15 

Val Asn Phe Asn Phe Thr Asp Phe Tyr Cys Glu Lys Asn Asn Val Arg 
20 25 30 

Gin Phe Ala Ser His Phe Leu Pro Pro Leu Tyr Trp Leu Val Phe lie 
35 40 45 

Val Gly Ala Leu Gly Asn Ser Leu Val lie Leu Val Tyr Trp Tyr Cys 
50 55 60 

Thr Arg Val Lys Thr Met Thr Asp Met Phe Leu Leu Asn Leu Ala lie 
65 70 75 80 

Ala Asp Leu Leu Phe Leu Val Thr Leu Pro Phe Trp Ala lie Ala Ala 
85 90 95 

Ala Asp Gin Trp Lys Phe Gin Thr Phe Met Cys Lys Val Val Asn Ser 
100 105 110 

Met Tyr Lys Met Asn Phe Tyr Ser Cys Val Leu Leu lie Met Cys lie 
-115 120 125 

Ser Val Asp Arg Tyr lie Ala lie Ala Gin Ala Met Arg Ala His Thr 
130 135 140 

Trp Arg Glu Lys Arg Leu Leu Tyr Ser Lys Met Val Cys Phe Thr lie 
145 150 155 160 

Trp Val Leu Ala Ala Ala Leu Cys lie Pro Glu lie Leu Tyr Ser Gin 
165 170 175 

lie Lys Glu Glu Ser Gly lie Ala lie Cys Thr Met Val Tyr Pro Ser 
180 185 190 

Asp Glu Ser Thr Lys Leu Lys Ser Ala Val Leu Thr Leu Lys Val lie 
195 200 205 

Leu Gly Phe Phe Leu Pro Phe Val Val Met Ala Cys Cys Tyr Thr lie 
210 215 220 

lie lie His Thr Leu lie Gin Ala Lys Lys Ser Ser Lys His Lys Ala 
225 230 235 240 

Leu Lys Val Thr lie Thr Val Leu Thr Val Phe Val Leu Ser Gin Phe 
245 250 255 

Pro Tyr Asn Cys lie Leu Leu Val Gin Thr lie Asp Ala Tyr Ala Met 
260 265 270 

Phe lie Ser Asn Cys Ala Val Ser Thr Asn lie Asp lie Cys Phe Gin 
275 280 285 
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Val Thr Gin Thr lie Ala Phe Phe His Ser Cys Leu Asn Pro Val Leu 
290 295 300 

Tyr Val Phe Val Gly Glu Arg Phe Arg Arg Asp Leu Val Lys Thr Leu 
305 310 315 320 

5 Lys Asn Leu Gly Cys lie Ser Gin Ala Gin Trp Val Ser Phe Thr Arg 

325 330 335 

Arg Glu Gly Ser Leu Lys Leu Ser Ser Met Leu Leu Glu Thr Thr Ser 
340 345 350 

Gly Ala Leu Ser Leu 
10 355 

(26) INFORMATION FOR SEQ ID NO: 25: 

( i ) SEQUENCE. CHARACTERISTICS : 

(A) LENGTH: 1110 base pairs 

(B) TYPE: nucleic acid 
15 (C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 
(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 25: 

ATGGCCTCAT CGACCACTCG GGGCCCCAGG GTTTCTGACT TATTTTCTGG GCTGCCGCCG 6 0 

20 GCGGTCACAA CTCCCGCCAA CCAGAGCGCA GAGGCCTCGG CGGGCAACGG GTCGGTGGCT 12 0 

GGCGCGGACG CTCCAGCCGT CACGCCCTTC C AG AG C C TG C AGCTGGTGCA TCAGCTGAAG 180 

GGGCTGATCG TGCTGCTCTA CAG CGTCGTG GTGGTCGTGG GGCTGGTGGG CAACTGCCTG 240 

CTGGTGCTGG TGATCGCGCG GGTGCCGCGG CTGCACAACG TGACGAACTT CCTCATCGGC 3 00 

AACCTGGCCT TGTCCGACGT GCTCATGTGC ACCGCCTGCG TGCCGCTCAC GCTGGCCTAT 360 

25 GCCTTCGAGC CACGCGGCTG GGTGTTCGGC GGCGGCCTGT GCCACCTGGT CTTCTTCCTG 420 

CAGCCGGTCA CCGTCTATGT GTCGGTGTTC ACGCTCACCA CCATCGCAGT GGACCGCTAC 480 

GTCGTG CTGG TGCACCCGCT GAGGCGCGCA TCTCGCTGCG CCTCAGCCTA CGCTGTGCTG 54 0 

GCCATCTGGG CGCTGTCCGC GGTGCTGGCG CTGCCGCCCG CCGTGCACAC CTATCACGTG 6 00 

GAGCTCAAGC CGCACGACGT GCGCCTCTGC GAGGAGTTCT GGGGCTCCCA GGAGCGCCAG 66 0 

30 CGCCAGCTCT ACGCCTGGGG GCTGCTGCTG GTCACCTACC TGCTCCCTCT GCTGGTCATC 72 0 

CTCCTGTCTT ACGTCCGGGT GTCAGTGAAG CTCCGCAACC GCGTGGTGCC GGGCTGCGTG 780 

ACCCAGAGCC AGGCCGACTG GGACCGCGCT CGGCGCCGGC GCACCTTCTG CTTGCTGGTG 84 0 
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GTGGTCGTGG TGGTGTTCGC CGTCTGCTGG CTGCCGCTGC ACGTCTTCAA CCTGCTGCGG 900 

GACCTCGACC CCCACGCCAT CGACCCTTAC GCCTTTGGGC TGGTGCAGCT GCTCTGCCAC 96 0 

TGGCTCGCCA TGAGTTCGGC CTGCTACAAC CCCTTCATCT ACGCCTGGCT GCACGACAGC 102 0 

TTCCGCGAGG AGCTGCGCAA ACTGTTGGTC GCTTGGCCCC GCAAGATAGC CCCCCATGGC 108 0 

5 CAGAATATGA CCGTCAGCGT GGTCATCTGA 1110 
(21) INFORMATION FOR SEQ ID NO: 26: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 369 amino acids 

(B) TYPE: amino acid 
10 (C) STRANDEDNESS : 

(D) TOPOLOGY: not relevant 

(ii) MOLECULE TYPE: protein 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO:26: 

Met Ala Ser Ser Thr Thr Arg Gly Pro Arg Val Ser Asp Leu Phe Ser 
15 1 5 10 15 

Gly Leu Pro Pro Ala Val Thr Thr Pro Ala Asn Gin Ser Ala Glu Ala 
20 25 30 

Ser Ala Gly Asn Gly Ser Val Ala Gly Ala Asp Ala Pro Ala Val Thr 
35 40 45 

20 Pro Phe Gin Ser Leu Gin Leu Val His Gin Leu Lys Gly Leu lie Val 

50 55 60 

Leu Leu Tyr Ser Val Val Val Val Val Gly Leu Val Gly Asn Cys Leu 
65 70 75 80 

Leu Val Leu Val lie Ala Arg Val Pro Arg Leu His Asn Val Thr Asn 
25 85 90 95 

Phe Leu lie Gly Asn Leu Ala Leu Ser Asp Val Leu Met Cys Thr Ala 
100 105 110 

Cys Val Pro Leu Thr Leu Ala Tyr Ala Phe Glu Pro Arg Gly Trp Val 
115 120 125 

30 Phe Gly Gly Gly Leu Cys His Leu Val Phe Phe Leu Gin Pro Val Thr 

130 135 140 

Val Tyr Val Ser Val Phe Thr Leu Thr Thr lie Ala Val Asp Arg Tyr 
145 150 155 160 

Val Val Leu Val His Pro Leu Arg Arg Ala Ser Arg Cys Ala Ser Ala. 
35 165 170 175 
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Tyr Ala Val Leu Ala lie Trp Ala Leu Ser Ala Val Leu Ala Leu Pro 
180 165 190 

Pro Ala Val His Thr Tyr His Val Glu Leu Lys Pro His Asp Val Arg 
195 200 205 

5 Leu Cys Glu Glu Phe Trp Gly Ser Gin Glu Arg Gin Arg Gin Leu Tyr 

210 215 220 

Ala Trp Gly Leu Leu Leu Val Thr Tyr Leu Leu Pro Leu Leu Val lie 
225 230 235 240 

Leu Leu Ser Tyr Val Arg Val Ser Val Lys Leu Arg Asn Arg Val Val 
10 245 250 255 

Pro Gly Cys Val Thr Gin Ser Gin Ala Asp Trp Asp Arg Ala Arg Arg 
260 265 270 

Arg Arg Thr Phe Cys Leu Leu Val Val Val Val Val Val Phe Ala Val 
275 280 285 

15 Cys Trp Leu Pro Leu His Val Phe Asn Leu Leu Arg Asp Leu Asp Pro 

290 295 300 

His Ala lie Asp Pro Tyr Ala Phe Gly Leu Val Gin Leu Leu Cys His 
305 310 315 320 

Trp Leu Ala Met Ser Ser Ala Cys Tyr Asn Pro Phe lie Tyr Ala Trp 
20 325 330 335 

Leu His Asp Ser Phe Arg Glu Glu Leu Arg Lys Leu Leu Val Ala Trp 
340 345 350 

Pro Arg Lys lie Ala Pro His Gly Gin Asn Met Thr Val Ser Val Val 
355 360 365 

25 He 

(28) INFORMATION FOR SEQ ID NO: 27: 

(i) SEQUENCE CHARACTERISTICS: 
(A) LENGTH: 1083 base pairs 

30 (B) TYPE: nucleic acid 

(C) STRANDEDNESS : single 
{ D ) TOPOLOGY : 1 inear 

(ii) MOLECULE TYPE: DNA (genomic) 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 27; 
35 ATGGACCCAG AAGAAACTTC AGTTTATTTG GATTATTACT ATGCTACGAG CCCAAACTCT 6 0 

GACATCAGGG AGACCCACTC CCATGTTCCT TACACCTCTG TCTTCCTTCC AGTCT TTT AC 12 0 
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ACAGCTGTGT TCCTGACTGG AGTGCTGGGG AACCTTGTTC TCATGGGAGC GTTGCATTTC 180 

AAACCCGGCA GCCGAAGACT GATCGACATC TTTATCATCA ATCTGGCTGC CTCTGACTTC 24 0 

ATTTTTCTTG TCACATTGCC TCTCTGGGTG GATAAAGAAG CATCTCTAGG ACTGTGGAGG 3 00 

ACGGGCTCCT TCCTGTGCAA AGGGAGCTCC TACATGATCT CCGTCAATAT GCACTGCAGT 36 0 

5 GTCCTCCTGC TCACTTGCAT GAGTGTTGAC CGCTACCTGG CCATTGTGTG GCCAGTCGTA 420 

TCCAGGAAAT TCAGAAGGAC AGACTGTGCA TATGTAGTCT GTGCCAGCAT CTGGTTTATC 4 80 

TCCTGCCTGC TGGGGTTGCC TACTCTTCTG TCCAGGGAGC TCACGCTGAT TGATGATAAG 54 0 

CCATACTGTG CAGAGAAAAA GGCAACTCCA ATT AAACT C A TATGGTCCCT GGTGGCCTTA 6 00 

ATTTTCACCT TTTTTGTCCC TTTGTTGAGC ATTGTGACCT GCTACTGTTG CATTGCAAGG 660 

10 AAGCTGTGTG CCCATTACCA GCAATCAGGA AAGCACAACA AAAAGCTGAA GAAATCTATA 72 0 

AAG AT CAT C T TTATTGTCGT GGCAGCCTTT CTTGTCTCCT GGCTGCCCTT CAATACTTTC 78 0 

AAGTTCCTGG CCATTGTCTC TGGGTTG CGG CAAGAACACT ATTTACCCTC AGCTATTCTT 84 0 

CAGCTTGGTA TGGAGGTGAG TGGACCCTTG GCATTTGCCA ACAGCTGTGT CAACCCTTTC 900 

ATTTACTATA TCTTCGACAG CTACATCCGC CGGGC CATTG TCCACTGCTT GTGCCCTTGC 96 0 

15 CTGAAAAACT ATGACTTTGG GAGTAGCACT GAGACATCAG ATAGTC AC CT CACTAAGGCT 102 0 

CTCTCCACCT TCATTCATGC AGAAGATTTT GCCAGGAGGA GGAAGAGGTC TGTGTCACTC 108 0 

T AA 108 3 

(2 9) INFORMATION FOR SEQ ID NO: 28: 

(i) SEQUENCE CHARACTERISTICS: 
20 (A) LENGTH: 360 amino acids 

(B) TYPE: amino acid 

(C) STRANDEDNESS : 

(D) TOPOLOGY: not relevant 

<ii) MOLECULE TYPE: protein 

25 (xi) SEQUENCE DESCRIPTION: SEQ ID NO: 28: 

Met Asp Pro Glu Glu Thr Ser Val Tyr Leu Asp Tyr Tyr Tyr Ala Thr 
15 10 15 

Ser Pro Asn Ser Asp lie Arg Glu Thr His Ser His Val Pro Tyr Thr 
20 25 30 

30 Ser Val Phe Leu Pro Val Phe Tyr Thr Ala Val Phe Leu Thr Gly Val 

35 40 45 



BNSDOCtD: <WO 0Q22129A1J_> 



WO 00/22129 



PCT/US99/23938 



Leu Gly Asn Leu Val 
50 

Arg Arg Leu lie Asp 
65 

lie Phe Leu Val Thr 
85 

Gly Leu Trp Arg Thr 
100 

lie Ser Val Asn Met 
115 

Val Asp Arg Tyr Leu 
130 

Arg Arg Thr Asp Cys 
145 

Ser Cys Leu Leu Gly 
165 

lie Asp Asp Lys Pro 
180 

Leu lie Trp Ser Leu 
195 

Leu Ser lie Val Thr 
210 

His Tyr Gin Gin Ser 
225 

Lys He He Phe He 
245 

Phe Asn Thr Phe Lys 
260 

His Tyr Leu Pro Ser 
275 

Pro Leu Ala Phe Ala 
290 

Phe Asp Ser Tyr He 
305 

Leu Lys Asn Tyr Asp 
325 



24 

Leu Met Gly Ala Leu His 
55 

lie Phe He He Asn Leu 
70 75 

Leu Pro Leu Trp Val Asp 
90 

Gly Ser Phe Leu Cys Lys 
105 

His Cys Ser Val Leu Leu 
120 

Ala He Val Trp Pro Val 
135 

Ala Tyr Val Val Cys Ala 
150 155 

Leu Pro Thr Leu Leu Ser 
170 

Tyr Cys Ala Glu Lys Lys 
185 

Val Ala Leu He Phe Thr 
200 

Cys Tyr Cys Cys He Ala 
215 

Gly Lys His Asn Lys Lys 
230 235 

Val Val Ala Ala Phe Leu 
250 

Phe Leu Ala He Val Ser 
265 

Ala He Leu Gin Leu Gly 
280 

Asn Ser Cys Val Asn Pro 
295 

Arg Arg Ala He Val His 
310 315 

Phe Gly Ser Ser Thr Glu 
330 



Phe Lys Pro Gly Ser 
60 

Ala Ala Ser Asp Phe 
80 

Lys Glu Ala Ser Leu 
95 

Gly Ser Ser Tyr Met 
110 

Leu Thr Cys Met Ser 
125 

Val Ser Arg Lys Phe 
140 

Ser He Trp Phe He 
160 

Arg Glu Leu Thr Leu 
175 

Ala Thr Pro He Lys 
190 

Phe Phe Val Pro Leu 
205 

Arg Lys Leu Cys Ala 
220 

Leu Lys Lys Ser He 
240 

Val Ser Trp Leu Pro 
255 

Gly Leu Arg Gin Glu 
270 

Met Glu Val Ser Gly 
285 

Phe He Tyr Tyr He 
300 

Cys Leu Cys Pro Cys 
320 

Thr Ser Asp Ser His 
335 



Leu Thr Lys Ala Leu Ser Thr Phe He His Ala Glu Asp Phe Ala Arg 



WO 00/22129 



PCT/US99/23938 



25 

340 345 350 

Arg Arg Lys Arg Ser Val Ser Leu 
355 360 

(30) INFORMATION FOR SEQ ID NO : 2 9 : 

5 (i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 31 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 

10 (ii) MOLECULE TYPE: DNA (genomic) 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 29: 
CTAGAATTCT GACTCCAGCC AAAGCATGAA T 31 

(31) INFORMATION FOR SEQ ID NO: 30: 

(i) SEQUENCE CHARACTERISTICS: 
15 (A) LENGTH: 3 0 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 

20 (xi) SEQUENCE DESCRIPTION: SEQ ID NO: 30: 

GCTGGATCCT AAACAGTCTG CGCTCGGCCT 30 

(32) INFORMATION FOR SEQ ID NO: 31: 

(i) SEQUENCE CHARACTERISTICS: 
(A) LENGTH: 1020 base pairs 

25 (B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

( D ) TOPOLOGY : linear 

(ii) MOLECULE TYPE: DNA (genomic) 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 31: 

30 ATGAATGGCC TTGAAGTGGC TCCCCCAGGT CTGATCACCA ACTTCTCCCT GGCCACGGCA 6 0 

GAGCAATGTG GCCAGGAGAC GCCACTGGAG AACATGCTGT TCGCCTCCTT CTACCTTCTG 12 0 

GATTTTATCC TGGCTTTAGT TGGCAATACC CTGGCTCTGT GGCTTTTCAT CCGAGACCAC 18 0 

AAGTCCGGGA CCCCGGCCAA CGTGTTCCTG ATGCATCTGG CCGTGGCCGA CTTGTCGTGC 24 0 

GTGCTGGTCC TGCCCACCCG CCTGGTCTAC CACTTCTCTG GGAACCACTG GCCATTTGGG 3 00 
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GAAATCGCAT GCCGTCTCAC CGGCTTCCTC TTCTACCTCA ACATGTACGC CAGCATCTAC 36 0 

TTCCTCACCT GCATCAGCGC CGACCGTTTC CTGGCCATTG TGCACCCGGT CAAGTCCCTC 42 0 

AAGCTCCGCA GGCCCCTCTA CGCACACCTG GCCTGTGCCT TCCTGTGGGT GGTGGTGGCT 480 

GTGGCCATGG CCCCGCTGCT GGTGAGCCCA CAGACCGTGC AGACCAACCA CACGGTGGTC 54 0 

TGCCTGCAGC TGTACCGGGA GAAGGCCTCC CACCATGCCC TGGTGTCCCT GGCAGTGGCC 6 00 

TTCACCTTCC CGTTCATCAC CACGGTCACC TGCTACCTGC TGATCATCCG CAGCCTGCGG 66 0 

CAGGGCCTGC GTGTGGAGAA GCGCCTCAAG ACCAAGGCAG TGCGCATGAT CGCCATAGTG 72 0 

CTGGCCATCT TCCTGGTCTG CTTCGTGCCC TACCACGTCA ACCGCTCCGT CTACGTGCTG 78 0 

CACTACCGCA GCCATGGGGC CTCCTGCGCC ACCCAGCGCA TCCTGGCCCT GGCAAACCGC 84 0 

ATCACCTCCT GCCTCACCAG CCTCAACGGG GCACTCGACC CCATCATGTA TTTCTTCGTG 900 

GCTGAGAAGT TCCGCCACGC CCTGTGCAAC TTGCTCTGTG GCAAAAGGCT CAAGGGCCCG 96 0 

CCCCCCAGCT TCGAAGGGAA AACCAACGAG AGCTCGCTGA GTGCCAAGTC AGAGCTGTGA 102 0 
(33) INFORMATION FOR SEQ ID NO: 32: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 33 9 amino acids 

(B) TYPE: amino acid 

( C ) STRANDEDNESS : 

(D) TOPOLOGY: not relevant 

(ii) MOLECULE TYPE: protein 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO:32; 

Met Asn Gly Leu Glu Val Ala Pro Pro Gly Leu lie Thr Asn Phe Ser 
15 10 15 

Leu Ala Thr Ala Glu Gin Cys Gly Gin Glu Thr Pro Leu Glu Asn Met 
20 25 30 

Leu Phe Ala Ser Phe Tyr Leu Leu Asp Phe lie Leu Ala Leu Val Gly 
35 40 45 

Asn Thr Leu Ala Leu Trp Leu Phe lie Arg Asp His Lys Ser Gly Thr 
50 55 60 

Pro Ala Asn Val Phe Leu Met His Leu Ala Val Ala Asp Leu Ser Cys 
65 70 75 80 

Val Leu Val Leu Pro Thr Arg Leu Val Tyr His Phe Ser Gly Asn His 
85 90 95 



Trp Pro Phe Gly Glu lie Ala Cys Arg Leu Thr Gly Phe Leu Phe 



Tyr 



9 
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100 105 110 

Leu Asn Met Tyr Ala Ser lie Tyr Phe Leu Thr Cys lie Ser Ala Asp 
115 120 125 

Arg Phe Leu Ala lie Val His Pro Val Lys Ser Leu Lys Leu Arg Arg 
5 130 135 140 

Pro Leu Tyr Ala His Leu Ala Cys Ala Phe Leu Trp Val Val Val Ala 
145 150 155 160 

Val Ala Met Ala Pro Leu Leu Val Ser Pro Gin Thr Val Gin Thr Asn 
165 170 175 

10 His Thr Val Val Cys Leu Gin Leu Tyr Arg Glu Lys Ala Ser His His 

180 185 190 

Ala Leu Val Ser Leu Ala Val Ala Phe Thr Phe Pro Phe lie Thr Thr 
195 200 205 

Val Thr Cys Tyr Leu Leu lie lie Arg Ser Leu Arg Gin Gly Leu Arg 
15 210 215 220 

Val Glu Lys Arg Leu Lys Thr Lys Ala Val Arg Met lie Ala lie Val 
225 230 235 240 

Leu Ala lie Phe Leu Val Cys Phe Val Pro Tyr His Val Asn Arg Ser 
245 250 255 

20 Val Tyr Val Leu His Tyr Arg Ser His Gly Ala Ser Cys Ala Thr Gin 

260 265 270 

Arg lie Leu Ala Leu Ala Asn Arg lie Thr Ser Cys Leu Thr Ser Leu 
275 280 285 

Asn Gly Ala Leu Asp Pro lie Met Tyr Phe Phe Val Ala Glu Lys Phe 
25 290 295 300 

Arg His Ala Leu Cys Asn Leu Leu Cys Gly Lys Arg Leu Lys Gly Pro 
305 310 315 320 

Pro Pro Ser Phe Glu Gly Lys Thr Asn Glu Ser Ser Leu Ser Ala Lys 
325 330 335 

30 Ser Glu Leu 

(34) INFORMATION FOR SEQ ID NO: 33: 

(i) SEQUENCE CHARACTERISTICS: 
(A) LENGTH: 29 base pairs 

35 (B) TYPE: nucleic acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 
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(xi) SEQUENCE DESCRIPTION: SEQ ID NO:33: 
ATAAGATGAT CACCCTGAAC AATCAAGAT 
(35) INFORMATION FOR SEQ ID NO: 34: 



10 



29 



33 



(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 33 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 
(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 34; 

TCCGAATTCA TAACATTTCA CTGTTTATAT TGC 
(36) INFORMATION FOR SEQ ID NO: 35: 

(i) SEQUENCE CHARACTERISTICS: 
(A) LENGTH: 996 base pairs 

15 (B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 35: 

20 ATGATCACCC TGAACAATCA AGATCAACCT GTCACTTTTA ACAGCTCACA TCCAGATGAA 6 0 

TACAAAATTG CAGCCCTTGT CTTCTATAGC TGTATCTTCA TAATTGGATT ATTTGTTAAC 12 0 

ATCACTGCAT TATGGGTTTT CAGTTGTACC ACCAAGAAGA GAACCACGGT AACCATCTAT 18 0 

ATGATGAATG TGGCATTAGT GGACTTGATA TTTATAATGA CTTTACCCTT TCGAATGTTT 24 0 

TATTATGCAA AAGATGCATG GCCATTTGGA GAGTACTTCT GCCAGATTAT TGGAGCTCTC 30 0 

25 ACAGTGTTTT ACCCAAGCAT TGCTTTATGG CTTCTTGCCT TTATTAGTGC TGACAGATAC 360 

ATGGCCATTG TACAG CCGAA GTACGCCAAA GAACTTAAAA AC ACG TGC AA AGCCGTGCTG 42 0 

GCGTGTGTGG GAGTCTGGAT AATGACCCTG ACCACGACCA CCCCTCTGCT ACTGCTCTAT 480 

AAAGACCCAG ATAAAGACTC CACTCCCGCC ACCTGCCTCA AGATTTCTGA CATCATCTAT 54 0 

CTAAAAGCTG TGAACGTGCT GAACCTCACT CGACTGACAT TTTTTTTCTT GATTCCTTTG 600 

30 TTCATCATGA TTGGGTGC T A CTTGGTCATT ATTCATAATC TCCTTCACGG CAGGACGTCT 66 0 

AAG CTGAAAC CCAAAGTCAA GGAGAAGTCC ATAAGGATCA TCATCACGCT GCTGGTGCAG 72 0 
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GTGCTCGTCT GCTTTATGCC CTTCCACATC TGTTTCGCTT TCCTGATGCT GGGAACGGGG 78 0 

G AG AAC AG T T ACAATCCCTG GGGAGCCTTT ACCACCTTCC TCATGAACCT CAGCACGTGT 84 0 

CTGGATGTGA TTCTCTACTA CATCGTTTCA AAACAATTTC AGGCTCGAGT CATTAGTGTC 9 00 

ATGCTATACC GTAATTACCT TCGAAGCCTG CGCAGAAAAA GTTTCCGATC TGGTAGTCTA 96 0 

5 AGGTCACTAA GCAATATAAA CAGTGAAATG TTATGA 996 
(37) INFORMATION FOR SEQ ID NO: 36: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 331 amino acids 

(B) TYPE: amino acid 
10 (C) STRANDEDNESS : 

(D) TOPOLOGY: not relevant 

(ii) MOLECULE TYPE: protein 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 36: 

Met lie Thr Leu Asn Asn Gin Asp Gin Pro Val Thr Phe Asn Ser Ser 
15 1 5 10 15 

His Pro Asp Glu Tyr Lys lie Ala Ala Leu Val Phe Tyr Ser Cys lie 
20 25 30 

Phe lie lie Gly Leu Phe Val Asn lie Thr Ala Leu Trp Val Phe Ser 
35 40 45 

20 Cys Thr Thr Lys Lys Arg Thr Thr Val Thr lie Tyr Met Met Asn Val 

50 55 60 

Ala Leu Val Asp Leu lie Phe lie Met Thr Leu Pro Phe Arg Met Phe 
65 70 75 80 

Tyr Tyr Ala Lys Asp Ala Trp Pro Phe Gly Glu Tyr Phe Cys Gin lie 
25 85 90 95 

lie Gly Ala Leu Thr Val Phe Tyr Pro Ser lie Ala Leu Trp Leu Leu 
100 105 110 

Ala Phe lie Ser Ala Asp Arg Tyr Met Ala lie Val Gin Pro Lys Tyr 
115 120 125 

30 Ala Lys Glu Leu Lys Asn Thr Cys Lys Ala Val Leu Ala Cys Val Gly 

130 135 140 

Val Trp lie Met Thr Leu Thr Thr Thr Thr Pro Leu Leu Leu Leu Tyr 
145 150 155 160 

Lys Asp Pro Asp Lys Asp Ser Thr Pro Ala Thr Cys Leu Lys lie Ser 
35 165 170 175 
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Asp He He Tyr Leu Lys Ala Val Asn Val Leu Asn Leu Thr Arg Leu 
180 185 190 

Thr Phe Phe Phe Leu He Pro Leu Phe He Met He Gly Cys Tyr Leu 
195 200 205 

5 Val He He His Asn Leu Leu His Gly Arg Thr Ser Lys Leu Lys Pro 

210 215 220 

Lys Val Lys Glu Lys Ser He Arg lie He He Thr Leu Leu Val Gin 
225 230 235 2 40 

Val Leu Val Cys Phe Met Pro Phe His He Cys Phe Ala Phe Leu Met 
10 245 250 255 

Leu Gly Thr Gly Glu Asn Ser Tyr Asn Pro Trp Gly Ala Phe Thr Thr 
260 265 270 

Phe Leu Met Asn Leu Ser Thr Cys Leu Asp Val He Leu Tyr Tyr He 
275 280 285 



15 



Val Ser Lys Gin Phe Gin Ala Arg Val He Ser Val Met Leu Tyr Arg 
290 295 300 

Asn Tyr Leu Arg Ser Leu Arg Arg Lys Ser Phe Arg Ser Gly Ser Leu 



305 310 315 

Arg Ser Leu Ser Asn He Asn Ser Glu Met Leu 

330 



320 



20 325 

(3 8) INFORMATION FOR SEQ ID NO: 37: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 2 8 base pairs 

(B) TYPE: nucleic acid 
25 (C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO:37: 
CCAAGCTTCC AGGCCTGGGG TGTGCTGG 28 
30 (39) INFORMATION FOR SEQ ID NO: 38: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 2 9 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 
35 (D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO:38: 
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ATGGATCCTG ACCTTCGGCC CCTGGCAGA 2 9 

(4 0) INFORMATION FOR SEQ ID NO: 39: 

(i) SEQUENCE CHARACTERISTICS: 
(A) LENGTH: 1077 base pairs 

5 (B) TYPE: nucleic acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 39: 

10 ATGCCCTCTG TGTCTCCAGC GGGGCCCTCG GCCGGGGCAG TCCCCAATGC CACCGCAGTG 6 0 

ACAACAGTGC GGACCAATGC CAGCGGGCTG GAGGTGCCCC TGTTCCACCT GTTTGCCCGG 12 0 

CTGGACGAGG AGCTGCATGG CACCTTCCCA GGCCTGTGCG TGGCGCTGAT GGCGGTGCAC 18 0 

GGAGCCATCT TCCTGGCAGG GCTGGTGCTC AACGGGCTGG CGCTGTACGT CTTCTGCTGC 24 0 

CGCACCCGGG CCAAGACACC CTCAGTCATC TACACCATCA ACCTGGTGGT GACCGATCTA 3 00 

15 CTGGTAGGGC TGTCCCTGCC CACGCGCTTC GCTGTGTACT ACGGCGCCAG GGGCTGCCTG 360 

CGCTGTGCCT TCCCGCACGT CCTCGGTTAC TTCCTCAACA TGCACTGCTC CATCCTCTTC 42 0 

CTCACCTGCA TCTGCGTGGA CCGCTACCTG GCCATCGTGC GGCCCGAAGG CTCCCGCCGC 4 80 

TGCCGCCAGC CTGCCTGTGC CAGGGCCGTG TGCGCCTTCG TGTGGCTGGC CGCCGGTGCC 54 0 

GTCACCCTGT CGGTGCTGGG CGTGACAGGC AGCCGGCCCT GCTGCCGTGT CTTTGCGCTG 6 00 

20 ACTGTCCTGG AGTTCCTGCT GCCCCTGCTG GTCATCAGCG TGTTTACCGG CCGCATCATG 660 

TGTGCACTGT CGCGGCCGGG TCTGCTCCAC CAGGGTCGCC AGCGCCGCGT GCGGGCCATG 72 0 

CAGCTCCTGC TCACGGTGCT CATCATCTTT CTCGTCTGCT TCACGCCCTT CCACGCCCGC 78 0 

CAAGTGGCCG TGGCGCTGTG GCCCGACATG CCACACCACA CGAGCCTCGT GGTCTACCAC 84 0 

GTGGCCGTGA CCCTCAGCAG CCTCAACAGC TGCATGGACC CCATCGTCTA CTGCTTCGTC 900 

25 ACCAGTGGCT TCCAGGCCAC CGTCCGAGGC CTCTTCGGCC AG C ACGGAG A GCGTGAGCCC 96 0 

AGCAGCGGTG ACGTGGTCAG CATGCACAGG AGCTCCAAGG GCTCAGGCCG TCATCACATC 102 0 

CTCAGTGCCG GCCCTCACGC CCTCACCCAG GCCCTGG CTA ATGGGCCCGA GGCTTAG 1077 

(41) INFORMATION FOR SEQ ID NO: 40: 

ti) SEQUENCE CHARACTERISTICS: 
30 (A) LENGTH: 3 58 amino acids 
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(B) TYPE: amino acid 

( C ) S TRANDEDNES S : 

(D) TOPOLOGY: not relevant 

(ii) MOLECULE TYPE: protein 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 40: 
Met Pro Ser Val Ser Pro Ala Gly Pro Ser Ala Gly Ala Val Pro Asn 

Ala Thr Ala Val Thr Thr Val Arg Thr Asn Ala Ser Gly Leu Glu Val 
20 25 30 

Pro Leu Phe His Leu Phe Ala Arg Leu Asp Glu Glu Leu His Gly Thr 
35 40 45 

Phe Pro Gly Leu Cys Val Ala Leu Met Ala Val His Gly Ala He Phe 
50 55 60 

Leu Ala Gly Leu Val Leu Asn Gly Leu Ala Leu Tyr Val Phe Cys Cys 
15 65 70 75 80 

Arg Thr Arg Ala Lys Thr Pro Ser Val He Tyr Thr He Asn Leu Val 
85 90 95 

Val Thr Asp Leu Leu Val Gly Leu Ser Leu Pro Thr Arg Phe Ala Val 
100 105 HO 

20 Tyr Tyr Gly Ala Arg Gly Cys Leu Arg Cys Ala Phe Pro His Val Leu 

H5 120 125 

Gly Tyr Phe Leu Asn Met His Cys Ser He Leu Phe Leu Thr Cys He 
130 135 140 

C Y S Va l Asp Arg Tyr Leu Ala He Val Arg Pro Glu Ala Pro Ala Ala 
25 "5 150 155 160 

Cys Arg Gin Pro Ala Cys Ala Arg Ala Val Cys Ala Phe Val Trp Leu 
165 170 175 

Ala Ala Gly Ala Val Thr Leu Ser Val Leu Gly Val Thr Gly Ser Arg 
180 185 190 

30 Pro Cys Cys Arg Val Phe Ala Leu Thr Val Leu Glu Phe Leu Leu Pro 

195 200 205 

Leu Leu Val He Ser Val Phe Thr Gly Arg He Met Cys Ala Leu Ser 
210 215 220 

Arg Pro Gly Leu Leu His Gin Gly Arg Gin Arg Arg Val Arg Ala Met 
35 225 230 235 240 

Gin Leu Leu Leu Thr Val Leu He He Phe Leu Val Cys Phe Thr Pro 
245 250 255 
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Phe His Ala Arg Gin Val Ala Val Ala Leu Trp Pro Asp Met Pro His 
260 265 270 

His Thr Ser Leu Val Val Tyr His Val Ala Val Thr Leu Ser Ser Leu 
275 280 285 

5 Asn Ser Cys Met Asp Pro lie Val Tyr Cys Phe Val Thr Ser Gly Phe 

290 295 300 

Gin Ala Thr Val Arg Gly Leu Phe Gly Gin His Gly Glu Arg Glu Pro 
305 310 315 320 

Ser Ser Gly Asp Val Val Ser Met His Arg Ser Ser Lys Gly Ser Gly 
10 325 330 335 

Arg His His lie Leu Ser Ala Gly Pro His Ala Leu Thr Gin Ala Leu 
340 345 350 

Ala Asn Gly Pro Glu Ala 
355 

15 (42) INFORMATION FOR SEQ ID NO: 41: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 3 0 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : single 
20 (D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO:41: 
GAGAATTCAC TCCTGAGCTC AAGATGAACT 3 0 

(43) INFORMATION FOR SEQ ID NO:42: 

25 (i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 3 0 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

( D ) TOPOLOGY : linear 

30 (ii) MOLECULE TYPE: DNA (genomic) 

(Xi) SEQUENCE DESCRIPTION: SEQ ID NO: 42: 
CGGGATCCCC GTAACTGAGC CACTTCAGAT 3 0 

(44) INFORMATION FOR SEQ ID NO:43: 

(i) SEQUENCE CHARACTERISTICS: 
35 (A) LENGTH: 1050 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : single 
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(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 

<xi) SEQUENCE DESCRIPTION: SEQ ID NO: 43: 

ATGAACTCCA CCTTGGATGG TAATCAGAGC AGCCACCCTT TTTGCCTCTT GGCATTTGGC 6 0 

TATTTGGAAA CTGTCAATTT TTGCCTTTTG GAAGTATTGA TTATTGTCTT TCTAACTGTA 12 0 

TTGATTATTT CTGGCAACAT CATTGTGATT TTTGTATTTC ACTGTGCACC TTTGTTGAAC 18 0 

CATCACACTA CAAGTTATTT TATCCAGACT ATGGCATATG CTGACCTTTT TGTTGGGGTG 24 0 

AGCTGCGTGG TCCCTTCTTT ATCACTCCTC CATCACCCCC TTCCAGTAGA GGAGTCCTTG 3 00 

ACTTGCCAGA TATTTGGTTT TGTAGTATCA GTTCTGAAGA GCGTCTCCAT GGCTTCTCTG 360 

GCCTGTATCA GCATTGATAG ATACATTGCC ATTACTAAAC CTTTAACCTA TAATACTCTG 42 0 

GTTAC AC CCT GGAGACTACG CCTGTGTATT TTCCTGATTT GGCTATACTC GACCCTGGTC 48 0 

TTCCTGCCTT CCTTTTTCCA CTGGGGCAAA CCTGGATATC ATGGAGATGT GTTTCAGTGG 54 0 

TGTGCGGAGT CCTGGCACAC CGACTCCTAC TTCACCCTGT TCATCGTGAT GATGTTATAT 600 

GCCCCAGCAG CCCTTATTGT CTGCTTCACC TATTTCAACA TCTTCCGCAT CTG CCAAC AG 66 0 

CACACAAAGG ATATCAGCGA AAGGCAAGCC CGCTTCAG C A GCCAGAGTGG GGAGACTGGG 72 0 

GAAGTGCAGG CCTGTCCTGA TAAGCGCTAT GCCATGGTCC TGTTT CGAAT CACTAGTGTA 78 0 

TTTTAC ATC C TCTGGTTGCC ATATATCATC TACTTCTTGT TGGAAAGCTC CACTGGCCAC 84 0 

AGCAACCGCT TCGCATCCTT CTTGACCACC TGGCTTGCTA TTAGTAACAG TTTCTGCAAC 90 0 

TGTGTAATTT ATAGTCTCTC CAACAGTGTA TTCCAAAGAG G ACT AAAG CG CCTCTCAGGG 960 

G CTATGTGT A CTTCTTGTGC AAGTCAGACT ACAGCCAACG ACCCTTACAC AGTTAGAAGC 102 0 

AAAGGCCCTC TTAATGGATG TCATATCTGA 1050 
(4 5) INFORMATION FOR SEQ ID NO: 44: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 34 9 amino acids 

(B) TYPE: amino acid 

( C ) STRANDEDNES S : 

(D) TOPOLOGY: not relevant 

(ii) MOLECULE TYPE: protein 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 44: 
Met Asn Ser Thr Leu Asp Gly Asn Gin Ser Ser His Pro Phe Cys Leu 
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15 10 15 

Leu Ala Phe Gly Tyr Leu Glu Thr Val Asn Phe Cys Leu Leu Glu Val 
20 25 30 

Leu He He Val Phe Leu Thr Val Leu He lie Ser Gly Asn He He 
5 35 40 45 

Val He Phe Val Phe His Cys Ala Pro Leu Leu Asn His His Thr Thr 
50 55 60 

Ser Tyr Phe He Gin Thr Met Ala Tyr Ala Asp Leu Phe Val Gly Val 
65 70 75 80 

10 Ser Cys Val Val Pro Ser Leu Ser Leu Leu His His Pro Leu Pro Val 

85 90 95 

Glu Glu Ser Leu Thr Cys Gin He Phe Gly Phe Val Val Ser Val Leu 
100 105 110 

Lys Ser Val Ser Met Ala Ser Leu Ala Cys He Ser He Asp Arg Tyr 
15 115 120 125 

He Ala He Thr Lys Pro Leu Thr Tyr Asn Thr Leu Val Thr Pro Trp 
130 135 140 

Arg Leu Arg Leu Cys He Phe Leu He Trp Leu Tyr Ser Thr Leu Val 
145 150 155 160 

20 Phe Leu Pro Ser Phe Phe His Trp Gly Lys Pro Gly Tyr His Gly Asp 

165 170 175 

Val Phe Gin Trp Cys Ala Glu Ser Trp His Thr Asp Ser Tyr Phe Thr 
180 185 190 

Leu Phe He Val Met Met Leu Tyr Ala Pro Ala Ala Leu He Val Cys 
25 195 200 205 

Phe Thr Tyr Phe Asn He Phe Arg He Cys Gin Gin His Thr Lys Asp 
210 215 220 

He Ser Glu Arg Gin Ala Arg Phe Ser Ser Gin Ser Gly Glu Thr Gly 
225 230 235 240 

30 Glu Val Gin Ala Cys Pro Asp Lys Arg Tyr Ala Met Val Leu Phe Arg 

245 250 255 

He Thr Ser Val Phe Tyr He Leu Trp Leu Pro Tyr He He Tyr Phe 
260 265 270 

Leu Leu Glu Ser Ser Thr Gly His Ser Asn Arg Phe Ala Ser Phe Leu 
35 275 280 285 

Thr Thr Trp Leu Ala He Ser Asn Ser Phe Cys Asn Cys Val He Tyr 
290 295 300 
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Ser Leu Ser Asn Ser Val Phe Gin Arg Gly Leu Lys Arg Leu Ser Gly 
305 310 315 320 

Ala Met Cys Thr Ser Cys Ala Ser Gin Thr Thr Ala Asn Asp Pro Tyr 
325 330 335 

5 Thr Val Arg Ser Lys Gly Pro Leu Asn Gly Cys His He 

340 345 

(46) INFORMATION FOR SEQ ID NO: 45: 

(i) SEQUENCE CHARACTERISTICS: 
(A) LENGTH: 3 0 base pairs 

10 (B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

( D ) TOPOLOGY : 1 inear 

(ii) MOLECULE TYPE: DNA (genomic) 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO:45: 



15 TCCCCCGGGA AAAAAACCAA CTGCTCCAAA 

(47) INFORMATION FOR SEQ ID NO:46: 

20 



TAGGATC C AT TTGAATGTGG ATTTGGTGAA A 
25 (4 8) INFORMATION FOR SEQ ID NO: 47: 



30 



31 



(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 31 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO:46: 



(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 1302 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 
30 (d) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO:47: 
ATGTGTTTTT CTCCCATTCT GGAAATCAAC ATGCAGTCTG AATCTAACAT TACAGTGCGA 
GATGACATTG ATGACATCAA CACCAATATG TACCAACCAC TATCATATCC GTTAAGCTTT 
35 CAAGTGTCTC TCACCGGATT TCTTATGTTA GAAATTGTGT TGGGACTTGG CAGCAACCTC 
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ACTGTATTGG TACTTTACTG CATGAAATCC AACTTAATCA ACTCTGTCAG TAACATTATT 24 0 

ACAATGAATC TTCATGTACT TGATGTAATA ATTTGTGTGG GATGTATTCC TCTAACTATA 3 00 

GTTATCCTTC TGCTTTCACT GGAGAGTAAC ACTGCTCTCA TTTGCTGTTT CCATGAGGCT 36 0 

TGTGTATCTT TTGCAAGTGT CTCAACAGCA ATCAACGTTT TTG CTATCAC TTTGGACAGA 42 0 

5 TATGACATCT CTGTAAAAC C TGCAAACCGA ATTCTGACAA TGGGCAGAGC TGTAATGTTA 48 0 

ATGATATCCA TTTGGATTTT TTCTTTTTTC TCTTTCCTGA TTC CTTTTAT TGAGGTAAAT 54 0 

TTTTTCAGTC TTCAAAGTGG AAATACCTGG GAAAACAAGA CACTTTTATG TGTCAGTACA 6 00 

AATGAATACT ACACTGAACT GGGAATGTAT TATCACCTGT TAGTACAGAT CCCAATATTC 66 0 

TTTTTCACTG TTGTAGTAAT GTTAATCACA T AC AC C AAAA TACTTCAGGC TCTTAATATT 72 0 

10 CGAATAGGCA CAAGATTTTC AACAGGGCAG AAGAAGAAAG CAAGAAAGAA AAAGACAATT 78 0 

TCTCTAACCA CACAACATGA GGCTACAGAC ATGTCACAAA GCAGTGGTGG GAGAAATGTA 84 0 

GTCTTTGGTG TAAGAACTTC AGTTTCTGTA ATAATTGCCC TCCGGCGAGC TGTGAAACGA 90 0 

CACCGTGAAC GACGAGAAAG ACAAAAGAGA GTCTTCAGGA TGTCTTTATT GATTATTTCT 96 0 

ACATTTCTTC TCTGCTGGAC ACCAATTTCT GTTTTAAATA CCACCATTTT ATGTTTAGGC 1020 

15 CCAAGTGACC TTTTAGTAAA ATTAAGATTG TGTTTTTTAG TCATGGCTTA TGGAACAACT 108 0 

ATATTTCACC CTCTATTATA TGCATTCACT AGACAAAAAT TTCAAAAGGT CTTGAAAAGT 114 0 

AAAATGAAAA AGCGAGTTGT TTCTATAGTA GAAGCTGATC CCCTGCCTAA TAATGCTGTA 12 0 0 

ATACACAACT CTTGGATAGA TCC CAAAAGA AACAAAAAAA TTACCTTTGA AGATAGTGAA 12 6 0 

ATAAGAGAAA AACGTTTAGT GCCTCAGGTT GTCACAGACT AG 13 02 

20 (4 9) INFORMATION FOR SEQ ID NO: 48: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 43 3 amino acids 

(B) TYPE: amino acid 

(C) STRANDEDNESS : 

25 (D) TOPOLOGY: not relevant 

(ii) MOLECULE TYPE: protein 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO:48: 

Met Cys Phe Ser Pro lie Leu Glu lie Asn Met Gin Ser Glu Ser Asn 
15 10 15 

30 lie Thr Val Arg Asp Asp lie Asp Asp lie Asn Thr Asn Met Tyr Gin 

20 25 30 
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Pro Leu Ser Tyr Pro Leu Ser Phe Gin Val Ser Leu Thr Gly Phe Leu 
35 40 45 

Met Leu Glu lie Val Leu Gly Leu Gly Ser Asn Leu Thr Val Leu Val 
50 55 60 

5 Leu Tyr Cys Met Lys Ser Asn Leu lie Asn Ser Val Ser Asn lie lie 

65 70 75 80 

Thr Met Asn Leu His Val Leu Asp Val lie lie Cys Val Gly Cys lie 
85 90 95 

Pro Leu Thr lie Val lie Leu Leu Leu Ser Leu Glu Ser Asn Thr Ala 
10 100 105 110 

Leu lie Cys Cys Phe His Glu Ala Cys Val Ser Phe Ala Ser Val Ser 
115 120 125 

Thr Ala lie Asn Val Phe Ala lie Thr Leu Asp Arg Tyr Asp lie Ser 
130 135 140 

15 val Lys Pro Ala Asn Arg lie Leu Thr Met Gly Arg Ala Val Met Leu 

145 150 155 160 

Met lie Ser lie Trp lie Phe Ser Phe Phe Ser Phe Leu lie Pro Phe 
165 170 175 

He Glu Val Asn Phe Phe Ser Leu Gin Ser Gly Asn Thr Trp Glu Asn 
20 180 185 190 

Lys Thr Leu Leu Cys Val Ser Thr Asn Glu Tyr Tyr Thr Glu Leu Gly 
195 200 205 

Met Tyr Tyr His Leu Leu Val Gin He Pro He Phe Phe Phe Thr Val 
210 215 220 

25 Val Val Met Leu He Thr Tyr Thr Lys He Leu Gin Ala Leu Asn He 

225 230 235 240 

Arg He Gly Thr Arg Phe Ser Thr Gly Gin Lys Lys Lys Ala Arg Lys 
245 250 255 

Lys Lys Thr He Ser Leu Thr Thr Gin His Glu Ala Thr Asp Met Ser 
30 260 265 270 

Gin Ser Ser Gly Gly Arg Asn Val Val Phe Gly Val Arg Thr Ser Val 
275 280 285 

Ser Val He He Ala Leu Arg Arg Ala Val Lys Arg His Arg Glu Arg 
290 295 300 

35 Arg Glu Arg Gin Lys Arg Val Phe Arg Met Ser Leu Leu He He Ser 

305 310 315 320 

Thr Phe Leu Leu Cys Trp Thr Pro He Ser Val Leu Asn Thr Thr He 



BNSDOCID: <WO_ _0022129A1 J. > 



WO 00/22129 



PCT/US99/23938 



325 

Leu Cys Leu Gly Pro Ser 
340 

Leu Val Met Ala Tyr Gly 
355 

Phe Thr Arg Gin Lys Phe 
370 

Arg Val Val Ser lie Val 
385 390 

lie His Asn Ser Trp lie 
405 

Glu Asp Ser Glu lie Arg 
420 

Asp 



39 

330 

Asp Leu Leu Val Lys 
345 

Thr Thr lie Phe His 
360 

Gin Lys Val Leu Lys 
375 

Glu Ala Asp Pro Leu 
395 

Asp Pro Lys Arg Asn 
410 

Glu Lys Arg Leu Val 
425 



335 

Leu Arg Leu Cys Phe 
350 

Pro Leu Leu Tyr Ala 
365 

Ser Lys Met Lys Lys 
380 

Pro Asn Asn Ala Val 
400 

Lys Lys lie Thr Phe 
415 

Pro Gin Val Val Thr 
430 



(50) INFORMATION FOR SEQ ID NO:49: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 3 0 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY : linear 

(ii) MOLECULE TYPE: DNA (genomic) 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 49: 
GTGAAGCTTG CCTCTGGTGC CTGCAGGAGG 

(51) INFORMATION FOR SEQ ID NO: 50: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 31 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 50: 
GCAGAATTCC CGGTGGCGTG TTGTGGTGCC C 

(52) INFORMATION FOR SEQ ID NO: 51: 

(i) SEQUENCE CHARACTERISTICS: 

.(A) LENGTH: 1209 base pairs 
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(B) TYPE: nucleic acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO:51: 

ATGTTGTGTC CTTCCAAGAC AGATGGCTCA GGGCACTCTG GTAGGATTCA CCAGGAAACT 6 0 

CATGGAGAAG GGAAAAGGGA CAAGATTAGC AACAGTGAAG GGAGGGAGAA TGGTGGGAGA 12 0 

GGATTC CAG A TGAACGGTGG GTCGCTGGAG GCTGAGCATG CCAG CAGGAT GTCAGTTCTC 18 0 

AGAGCAAAGC CCATGTCAAA CAGCCAACGC TTGCTCCTTC TGTCCCCAGG ATCACCTCCT 24 0 

CGCACGGGGA GCATCTCCTA CATCAACATC ATCATGCCTT CGGTGTTCGG CACCATCTGC 3 00 

CTCCTGGGCA TCATCGGGAA CTCCACGGTC ATCTTCGCGG TCGTGAAGAA GTCCAAGCTG 360 

CACTGGTGCA ACAACGTCCC CGACATCTTC ATCATCAACC TCTCGGTAGT AGATCTCCTC 42 0 

TTTCTCCTGG GCATGCCCTT CATGATCCAC CAGCTCATGG GCAATGGGGT GTGGCACTTT 480 

GGGGAGACCA TGTGCACCCT CATCACGGCC ATGGATGCCA ATAGTCAGTT CACCAGCACC 54 0 

TACATCCTGA CCGCCATGGC CATTGACCGC TACCTGGCCA CTGTCCACCC CATCTCTTCC 6 00 

ACGAAGTTCC GGAAGCCCTC TGTGGCCACC CTGGTGATCT GCCTCCTGTG GGCCCTCTCC 660 

TTCATCAGCA TCACCCCTGT GTGG CTGTAT GCCAGACTCA TCCCCTTCCC AGGAGGTGCA 72 0 

GTGGGCTGCG GCATACGCCT GCCCAACCCA GACACTGACC TCTACTGGTT CACCCTGTAC 780 

CAGTTTTTCC TGGCCTTTGC CCTGCCTTTT GTGGTCATCA CAGCCGCATA CGTGAGGATC 84 0 

CTGCAGCGCA TGACGTCCTC AGTGGCCCCC GCCTCCCAGC GCAGCATCCG GCTGCGGACA 90 0 

AAGAGGGTGA CCCGCACAGC CATCGCCATC TGTCTGGTCT TCTTTGTGTG CTGGGCACCC 96 0 

TACTATGTGC TACAGCTGAC CCAGTTGTCC ATCAGCCGCC CGACCCTCAC CTTTGTCTAC 102 0 

TTATACAATG CGGCCATCAG CTTGGGCTAT G CCAACAGCT GCCTCAACCC CTTTGTGTAC 108 0 

ATCGTGCTCT GTGAGACGTT CCGCAAACGC TTGGTCCTGT CGGTGAAGCC TGCAGCCCAG 114 0 

GGGCAGCTTC GCGCTGTCAG CAACGCTCAG ACGGCTGACG AGGAGAGGAC AGAAAGCAAA 12 00 

GGCACCTGA 12 09 
(53) INFORMATION FOR SEQ ID NO: 52: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 4 02 amino acids 

(B) TYPE: amino acid 
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(C) STRANDEDNESS : 

(D) TOPOLOGY: not relevant 

(ii) MOLECULE TYPE: protein 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 52: 

5 Met Leu Cys Pro Ser Lys Thr Asp Gly Ser Gly His Ser Gly Arg lie 

15 10 15 

His Gin Glu Thr His Gly Glu Gly Lys Arg Asp Lys lie Ser Asn Ser 
20 25 30 

Glu Gly Arg Glu Asn Gly Gly Arg Gly Phe Gin Met Asn Gly Gly Ser 
10 35 40 45 

Leu Glu Ala Glu His Ala Ser Arg Met Ser Val Leu Arg Ala Lys Pro 
50 . 55 60 

Met Ser Asn Ser Gin Arg Leu Leu Leu Leu Ser Pro Gly Ser Pro Pro 
65 70 75 80 

15 Arg Thr Gly Ser lie Ser Tyr lie Asn lie lie Met Pro Ser Val Phe 

85 90 95 

Gly Thr lie Cys Leu Leu Gly lie He Gly Asn Ser Thr Val He Phe 
100 105 110 

Ala Val Val Lys Lys Ser Lys Leu His Trp Cys Asn Asn Val Pro Asp 
20 115 120 125 

He Phe He He Asn Leu Ser Val Val Asp Leu Leu Phe Leu Leu Gly 
130 135 140 

Met Pro Phe Met He His Gin Leu Met Gly Asn Gly Val Trp His Phe 
145 150 155 160 

25 Gly Glu Thr Met Cys Thr Leu He Thr Ala Met Asp Ala Asn Ser Gin 

165 170 175 

Phe Thr Ser Thr Tyr He Leu Thr Ala Met Ala He Asp Arg Tyr Leu 
180 185 190 

Ala Thr Val His Pro lie Ser Ser Thr Lys Phe Arg Lys Pro Ser Val 
30 195 200 205 

Ala Thr Leu Val lie Cys Leu Leu Trp Ala Leu Ser Phe He Ser lie 
210 215 220 

Thr Pro Val Trp Leu Tyr Ala Arg Leu He Pro Phe Pro Gly Gly Ala 
225 230 235 240 

35 Val Gly Cys Gly He Arg Leu Pro Asn Pro Asp Thr Asp Leu Tyr Trp 

245 250 255 
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Phe Thr Leu Tyr Gin Phe Phe Leu Ala Phe Ala Leu Pro Phe Val Val 
260 265 270 

lie Thr Ala Ala Tyr Val Arg lie Leu Gin Arg Met Thr Ser Ser Val 
275 280 255 

5 Ala Pro Ala Ser Gin Arg Ser lie Arg Leu Arg Thr Lys Arg Val Thr 

290 295 300 

Arg Thr Ala lie Ala lie Cys Leu Val Phe Phe Val Cys Trp Ala Pro 
305 310 315 320 

Tyr Tyr Val Leu Gin Leu Thr Gin Leu Ser lie Ser Arg Pro Thr Leu 
10 325 330 335 

Thr Phe Val Tyr Leu Tyr Asn Ala Ala lie Ser Leu Gly Tyr Ala Asn 
340 345 350 

Ser Cys Leu Asn Pro Phe Val Tyr lie Val Leu Cys Glu Thr Phe Arg 
355 360 365 

15 Lys Arg Leu Val Leu Ser Val Lys Pro Ala Ala Gin Gly Gin Leu Arg 

370 375 380 

Ala Val Ser Asn Ala Gin Thr Ala Asp Glu Glu Arg Thr Glu Ser Lys 
385 390 395 400 



20 



Gly Thr 

(54) INFORMATION FOR SEQ ID NO:53: 



(i) SEQUENCE CHARACTERISTICS : 

(A) LENGTH: 27 base pairs 

(B) TYPE: nucleic acid 
25 (C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 53: 
GGCGGATCCA TGGATGTGAC TTCCCAA 2 7 

30 (55) INFORMATION FOR SEQ ID NO: 54: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 27 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 
35 (D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 54: 
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GGCGGATCCC TACACGGCAC TGCTGAA 2 7 
(56) INFORMATION FOR SEQ ID NO: 55: 

(i) SEQUENCE CHARACTERISTICS: 
(A) LENGTH: 1128 base pairs 

5 (B) TYPE: nucleic acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 55: 

10 ATGGATGTGA CTTCCCAAGC CCGGGGCGTG GGCCTGGAGA TGTACCCAGG CACCGCGCAC 6 0 

GCTGCGGCCC CCAACACCAC CTCCCCCGAG CTCAACCTGT CCCACCCGCT CCTGGGCACC 12 0 

GCCCTGGCCA ATGGGACAGG TGAGCTCTCG GAGCACCAGC AGTACGTGAT CGGCCTGTTC 18 0 

CTCTCGTGCC TCTACACCAT CTTCCTCTTC CCCATCGGCT TTGTGGGCAA CATCCTGATC 24 0 

CTGGTGGTGA ACATCAGCTT CCGCGAGAAG ATGACCATCC CCGACCTGTA CTTCATCAAC 3 00 

15 CTGGCGGTGG CGGACCTCAT CCTGGTGGCC GACTCCCTCA TTGAGGTGTT CAACCTGCAC 36 0 

GAGCGGTACT ACGACATCGC CGTCCTGTGC ACCTTCATGT CGCTCTTCCT GCAGGTCAAC 42 0 

ATGTACAGCA GCGTCTTCTT CCTCACCTGG ATGAGCTTCG AC CG CTACAT CGCCCTGGCC 4 80 

AGGGCCATGC GCTGCAGCCT GTTCCGCACC AAGCACCACG CCCGGCTGAG CTGTGGCCTC 54 0 

ATCTGGATGG CATCCGTGTC AGCCACGCTG GTGCCCTTCA CCGCCGTGCA CCTGCAGCAC 600 

20 ACCGACGAGG CCTGCTTCTG TTTCGCGGAT GTCCGGGAGG TGCAGTGGCT CGAGGTCACG 66 0 

CTGGGCTTCA TCGTGCCCTT CGCCATCATC GGCCTGTGCT ACTCCCTCAT TGTCCGGGTG 72 0 

CTGGTCAGGG CGCACCGGCA CCGTGGGCTG CGGCCCCGGC GGCAGAAGGC GCTCCGCATG 78 0 

ATCCTCGCGG TGGTGCTGGT CTTCTTCGTC TGCTGGCTGC CGGAGAACGT CTTCATCAGC 84 0 

GTGCACCTCC TGCAGCGGAC GCAGCCTGGG GCCGCTCCCT GCAAGCAGTC TTTCCGCCAT 900 

25 GCCCACCCCC TCACGGGCCA CATTGTCAAC CTCGCCGCCT TCTCCAACAG CTGCCTAAAC 960 

CCCCTCATCT ACAGCTTTCT CGGGGAGACC TTCAGGGACA AGCTGAGGCT GTACATTGAG 1020 

CAGAAAACAA ATTTGCCGGC CCTGAACCGC TTCTGTCACG CTGCCCTGAA GGCCGTCATT 108 0 

C C AG AC AG C A CCGAGCAGTC GGATGTGAGG TTCAGCAGTG CCGTGTGA 112 8 
(57) INFORMATION FOR SEQ ID NO: 56: 
30 (i) SEQUENCE CHARACTERISTICS : 
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(A) LENGTH: 3 75 amino acids 

(B) TYPE: amino acid 

(C) STRANDEDNESS : 

(D) TOPOLOGY: not relevant 

(ii) MOLECULE TYPE: protein 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO:56: 

Met Asp Val Thr Ser Gin Ala Arg Gly Val Gly Leu Glu Met Tyr Pro 
1 5 10 15 

Gly Thr Ala His Ala Ala Ala Pro Asn Thr Thr Ser Pro Glu Leu Asn 
20 25 30 

Leu Ser His Pro Leu Leu Gly Thr Ala Leu Ala Asn Gly Thr Gly Glu 
35 40 45 

Leu Ser Glu His Gin Gin Tyr Val lie Gly Leu Phe Leu Ser Cys Leu 
50 55 60 

Tyr Thr lie Phe Leu Phe Pro lie Gly Phe Val Gly Asn lie Leu lie 
65 70 75 80 

Leu Val Val Asn He Ser Phe Arg Glu Lys Met Thr He Pro Asp Leu 
85 90 95 

Tyr Phe lie Asn Leu Ala Val Ala Asp Leu He Leu Val Ala Asp Ser 
100 105 no 

Leu He Glu Val Phe Asn Leu His Glu Arg Tyr Tyr Asp He Ala Val 
115 120 125 

Leu Cys Thr Phe Met Ser Leu Phe Leu Gin Val Asn Met Tyr Ser Ser 
130 135 140 

Val Phe Phe Leu Thr Trp Met Ser Phe Asp Arg Tyr He Ala Leu Ala 
145 150 155 160 

Arg Ala Met Arg Cys Ser Leu Phe Arg Thr Lys His His Ala Arg Leu 
165 170 175 

Ser Cys Gly Leu He Trp Met Ala Ser Val Ser Ala Thr Leu Val Pro 
180 185 190 

Phe Thr Ala Val His Leu Gin His Thr Asp Glu Ala Cys Phe Cys Phe 
195 200 205 

Ala Asp Val Arg Glu Val Gin Trp Leu Glu Val Thr Leu Gly Phe He 
210 215 220 

Val Pro Phe Ala He He Gly Leu Cys Tyr Ser Leu He Val Arg Val 
225 230 235 240 



Leu Val Arg Ala His Arg His Arg Gly Leu Arg Pro Arg Arg Gin Lys 
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245 250 255 

Ala Leu Arg Met lie Leu Ala Val Val Leu Val Phe Phe Val Cys Trp 
260 265 270 

Leu Pro Glu Asn Val Phe lie Ser Val His Leu Leu Gin Arg Thr Gin 
5 275 280 285 

Pro Gly Ala Ala Pro Cys Lys Gin Ser Phe Arg His Ala His Pro Leu 
290 295 300 

Thr Gly His lie Val Asn Leu Ala Ala Phe Ser Asn Ser Cys Leu Asn 
305 310 315 320 

10 Pro Leu lie Tyr Ser Phe Leu Gly Glu Thr Phe Arg Asp Lys Leu Arg 

325 330 335 

Leu Tyr lie Glu Gin Lys Thr Asn Leu Pro Ala Leu Asn Arg Phe Cys 
340 345 350 

His Ala Ala Leu Lys Ala Val lie Pro Asp Ser Thr Glu Gin Ser Asp 
15 355 360 365 

Val Arg Phe Ser Ser Ala Val 
370 375 

(58) INFORMATION FOR SEQ ID NO: 57: 

(i) SEQUENCE CHARACTERISTICS : 

(A) LENGTH: 31 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

<ii) MOLECULE TYPE: DNA (genomic) 
(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 57: 
AAGGAATTCA CGGCCGGGTG ATGCCATTCC C 31 

(59) INFORMATION FOR SEQ ID NO: 58: 

(i) SEQUENCE CHARACTERISTICS: 
(A) LENGTH: 3 0 base pairs 

30 (B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 58: 
35 GGTGGATCCA TAAACACGGG CGTTGAGGAC 3 0 

(6 0) INFORMATION FOR SEQ ID NO: 59: 
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(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 96 0 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO:59: 

ATGCCATTCC CAAACTGCTC AGCCCCCAGC ACTGTGGTGG CCACAGCTGT GGGTGTCTTG 6 0 

CTGGGGCTGG AGTGTGGGCT GGGTCTGCTG GGCAACGCGG TGGCGCTGTG GACCTTCCTG 12 0 

TTCCGGGTCA GGGTGTGGAA GCCGTACGCT GTCTACCTGC TCAACCTGGC CCTGGCTGAC 18 0 

CTGCTGTTGG CTGCGTGCCT GCCTTTCCTG GCCGCCTTCT ACCTGAGCCT CCAGGCTTGG 24 0 

CATCTGGGCC GTGTGGGCTG CTGGGCCCTG CGCTTCCTGC TGGACCTCAG CCGCAGCGTG 3 00 

GGGATGGCCT TCCTGGCCGC CGTGGCTTTG GACCGGTACC TCCGTGTGGT CCACCCTCGG 36 0 

CTTAAGGTCA ACCTGCTGTC TCCTCAGGCG GCCCTGGGGG TCTCGGGCCT CGTCTGG CTC 42 0 

CTGATGGTCG CCCTCACCTG CCCGGGCTTG CTCATCTCTG AGGCCGCCCA GAACTCCACC 4 80 

AGGTGCCACA GTTTCTACTC CAGGGCAGAC GGCTCCTTCA GCATCATCTG GCAGGAAGCA 54 0 

CTCTCCTGCC TTCAGTTTGT CCTCCCCTTT GGCCTCATCG TGTTCTGCAA TGCAGGCATC 60 0 

AT CAGGG CTC TCCAGAAAAG ACTCCGGGAG CCTGAGAAAC AGCCCAAGCT TCAGCGGGCC 66 0 

CAGGCACTGG TCACCTTGGT GGTGGTGCTG TTTGCTCTGT GCTTTCTGCC CTGCTTCCTG 72 0 

GCCAGAGTCC TGATGCACAT CTT CCAGAAT CTGGGGAGCT GCAGGGCCCT TTGTGCAGTG 78 0 

GCTCATACCT CGGATGTCAC GGGCAGCCTC ACCTACCTGC ACAGTGTCGT CAACCCCGTG 84 0 

GTATACTGCT TCTCCAGCCC CACCTTCAGG AGCTCCTATC GGAGGGTCTT CCACACCCTC 900 

CGAGGCAAAG GGCAGGCAGC AGAGCCCCCA GATTTCAACC CCAGAGACTC CTATTCCTGA 96 0 
(61) INFORMATION FOR SEQ ID NO: 60: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 319 amino acids 

(B) TYPE: amino acid 

(C) STRANDEDNESS: 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: protein 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO:60: 
Met Pro Phe Pro Asn Cys Ser Ala Pro Ser Thr Val Val Ala Thr Ala 
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15 10 15 

Val Gly Val Leu Leu Gly Leu Glu Cys Gly Leu Gly Leu Leu Gly Asn 
20 25 30 

Ala Val Ala Leu Trp Thr Phe Leu Phe Arg Val Arg Val Trp Lys Pro 
35 40 45 

Tyr Ala Val Tyr Leu Leu Asn Leu Ala Leu Ala Asp Leu Leu Leu Ala 
50 55 60 

Ala Cys Leu Pro Phe Leu Ala Ala Phe Tyr Leu Ser Leu Gin Ala Trp 
65 70 75 80 

His Leu Gly Arg Val Gly Cys Trp Ala Leu Arg Phe Leu Leu Asp Leu 
85 90 95 

Ser Arg Ser Val Gly Met Ala Phe Leu Ala Ala Val Ala Leu Asp Arg 
100 105 110 

Tyr Leu Arg Val Val His Pro Arg Leu Lys Val Asn Leu Leu Ser Pro 
115 120 125 

Gin Ala Ala Leu Gly Val Ser Gly Leu Val Trp Leu Leu Met Val Ala 
130 135 140 

Leu Thr Cys Pro Gly Leu Leu lie Ser Glu Ala Ala Gin Asn Ser Thr 
145 150 155 160 

Arg Cys His Ser Phe Tyr Ser Arg Ala Asp Gly Ser Phe Ser lie lie 
165 170 175 

Trp Gin Glu Ala Leu Ser Cys Leu Gin Phe Val Leu Pro Phe Gly Leu 
180 185 190 

lie Val Phe Cys Asn Ala Gly lie lie Arg Ala Leu Gin Lys Arg Leu 
195 200 205 

Arg Glu Pro Glu Lys Gin Pro Lys Leu Gin Arg Ala Gin Ala Leu Val 
210 215 220 

Thr Leu Val Val Val Leu Phe Ala Leu Cys Phe Leu Pro Cys Phe Leu 
225 230 235 240 

Ala Arg Val Leu Met His lie Phe Gin Asn Leu Gly Ser Cys Arg Ala 
245 250 255 

Leu Cys Ala Val Ala His Thr Ser Asp Val Thr Gly Ser Leu Thr Tyr 
260 265 270 

Leu His Ser Val Val Asn Pro Val Val Tyr Cys Phe Ser Ser Pro Thr 
275 280 285 



Phe Arg Ser Ser Tyr Arg Arg Val Phe His Thr Leu Arg Gly Lys Gly 
290 295 300 
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Gin Ala Ala Glu Pro Pro Asp Phe Asn Pro Arg Asp Ser Tyr Ser 
305 310 315 

(62) INFORMATION FOR SEQ ID NO : 6 1 : 

(i) SEQUENCE CHARACTERISTICS: 
5 (A) LENGTH: 1143 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : single 
<D) TOPOLOGY: linear 

<ii) MOLECULE TYPE: DNA (genomic) 

10 (xi) SEQUENCE DESCRIPTION: SEQ ID NO: 61: 

ATGGAGGAAG GTGGTGATTT TGACAACTAC TATGGGGCAG ACAACCAGTC TGAGTGTGAG 6 0 

TACACAGACT GGAAATCCTC GGGGGCCCTC ATCCCTGCCA TCTACATGTT GGTCTTCCTC 12 0 

CTGGGCACCA CGGGAAACGG TCTGGTGCTC TGGACCGTGT TTCGGAGCAG CCGGGAGAAG 18 0 

AGGCGCTCAG CTGATATCTT CATTGCTAGC CTGGCGGTGG CTGACCTGAC CTTCGTGGTG 24 0 

15 ACGCTGCCCC TGTGGGCTAC CTACACGTAC CGGGACTATG ACTGGCCCTT TGGGACCTTC 3 00 

TTCTGCAAGC TCAGCAGCTA CCTCATCTTC GTCAACATGT ACGCCAGCGT CTTCTGCCTC 36 0 

ACCGGCCTCA GCTTCGACCG CTACCTGGCC ATCGTGAGGC CAGTGGCCAA TGCTCGGCTG 42 0 

AGGCTGCGGG TCAGCGGGGC CGTGGCCACG GCAGTTCTTT GGGTGCTGGC CGCCCTCCTG 48 0 

GCCATGCCTG TCATGGTGTT ACGCACCACC GGGGACTTGG AGAACACCAC TAAGGTGCAG 54 0 

20 TGCTACATGG ACTACTCCAT GGTGGCCACT GTGAGCTCAG AGTGGGC CTG GGAGGTGGGC 6 00 

CTTGGGGTCT CGTCCACCAC CGTGGGCTTT GTGGTGCCCT TCACCATCAT GCTGACCTGT 66 0 

TACTTCTT C A TCGCCCAAAC CATCGCTGGC CACTTCCGCA AGG AACG CAT CGAGGGCCTG 72 0 

CGGAAGCGGC GCCGGCTGCT CAGCATCATC GTGGTGCTGG TGGTG AC CTT TGCCCTGTGC 78 0 

TGGATGCCCT ACCACCTGGT G AAGACG CTG TACATGCTGG GCAGCCTGCT GCACTGGCCC 84 0 

25 TGTGACTTTG ACCTCTTCCT CATGAACATC TTCCCCTACT GCACCTGCAT C AG CTACGTC 900 

AACAGCTGCC TCAACCCCTT CCTCTATGCC TTTTTCGACC CCCGCTTCCG CCAGGCCTGC 96 0 

ACCTCCATGC TCTGCTGTGG CCAGAGCAGG TGCGCAGGCA CCTCCCACAG CAGCAGTGGG 102 0 

GAGAAGTCAG CCAGCTACTC TTCGGGGCAC AGCCAGGGGC CCGGCCCCAA CATGGGCAAG 108 0 

GGTGGAGAAC AGATGCACGA GAAATCCATC CCCTACAGCC AGGAGACCCT TGTGGTTGAC 114 0 

30 TAG 1143 
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(63) INFORMATION FOR SEQ ID NO: 62: 

<i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 38 0 amino acids 

(B) TYPE: amino acid 
5 (C) STRANDEDNESS : 

<D) TOPOLOGY: not relevant 

(ii) MOLECULE TYPE: protein 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO:62: 

Met Glu Glu Gly Gly Asp Phe Asp Asn Tyr Tyr Gly Ala Asp Asn Gin 
10 1 5 10 15 

Ser Glu Cys Glu Tyr Thr Asp Trp Lys Ser Ser Gly Ala Leu lie Pro 
20 25 30 

Ala lie Tyr Met Leu Val Phe Leu Leu Gly Thr Thr Gly Asn Gly Leu 
35 40 45 

15 Val Leu Trp Thr Val Phe Arg Ser Ser Arg Glu Lys Arg Arg Ser Ala 

50 55 60 

Asp lie Phe lie Ala Ser Leu Ala Val Ala Asp Leu Thr Phe Val Val 
65 70 75 80 

Thr Leu Pro Leu Trp Ala Thr Tyr Thr Tyr Arg Asp Tyr Asp Trp Pro 
20 85 90 95 

Phe Gly Thr Phe Phe Cys Lys Leu Ser Ser Tyr Leu lie Phe Val Asn 
100 105 110 

Met Tyr Ala Ser Val Phe Cys Leu Thr Gly Leu Ser Phe Asp Arg Tyr 
115 120 125 

25 Leu Ala lie Val Arg Pro Val Ala Asn Ala Arg Leu Arg Leu Arg Val 

130 135 140 

Ser Gly Ala Val Ala Thr Ala Val Leu Trp Val Leu Ala Ala Leu Leu 
145 150 155 160 

Ala Met Pro Val Met Val Leu Arg Thr Thr Gly Asp Leu Glu Asn Thr 
30 165 170 175 

Thr Lys Val Gin Cys Tyr Met Asp Tyr Ser Met Val Ala Thr Val Ser 
180 185 190 

Ser Glu Trp Ala Trp Glu Val Gly Leu Gly Val Ser Ser Thr Thr Val 
195 200 205 

35 Gly Phe Val Val Pro Phe Thr lie Met Leu Thr Cys Tyr Phe Phe lie 

210 215 220 

Ala Gin Thr lie Ala Gly His Phe Arg Lys Glu Arg lie Glu Gly Leu 
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225 230 235 240 

Arg Lys Arg Arg Arg Leu Leu Ser lie lie Val Val Leu Val Val Thr 
245 250 255 

Phe Ala Leu Cys Trp Met Pro Tyr His Leu Val Lys Thr Leu Tyr Met 
5 260 265 270 

Leu Gly Ser Leu Leu His Trp Pro Cys Asp Phe Asp Leu Phe Leu Met 
275 280 285 

Asn He Phe Pro Tyr Cys Thr Cys He Ser Tyr Val Asn Ser Cys Leu 
290 295 300 

10 Asn Pro Phe Leu Tyr Ala Phe Phe Asp Pro Arg Phe Arg Gin Ala Cys 

305 310 315 320 

Thr Ser Met Leu Cys Cys Gly Gin Ser Arg Cys Ala Gly Thr Ser His 
325 330 335 

Ser Ser Ser Gly Glu Lys Ser Ala Ser Tyr Ser Ser Gly His Ser Gin 
15 340 345 350 

Gly Pro Gly Pro Asn Met Gly Lys Gly Gly Glu Gin Met His Glu Lys 
355 360 365 

Ser lie Pro Tyr Ser Gin Glu Thr Leu Val Val Asp 
370 375 380 

20 (64) INFORMATION FOR SEQ ID NO:63: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 31 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : single 
25 (D) TOPOLOGY: linear 

<ii) MOLECULE TYPE: DNA (genomic) 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO:63: 

TGAGAATTCT GGTGACTCAC AGCCGGCACA G 31 

(65) INFORMATION FOR SEQ ID NO: 64: 

30 (i) SEQUENCE CHARACTERISTICS : 

(A) LENGTH: 31 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 

35 (ii) MOLECULE TYPE: DNA (genomic) 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO:64: 
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G CCGGATCC A AGGAAAAGCA GCAATAAAAG G 31 

(66) INFORMATION FOR SEQ ID NO: 65: 

(i) SEQUENCE CHARACTERISTICS : 
(A) LENGTH: 1119 base pairs 

5 (B) TYPE: nucleic acid 

(C) STRANDEDNESS : single 

( D ) TOPOLOGY : 1 inear 

(ii) MOLECULE TYPE: DNA (genomic) 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 65: 

10 ATGAACTACC CGCTAACGCT GGAAATGGAC CTCGAGAACC TGGAGGACCT GTTCTGGGAA 6 0 

CTGGACAGAT TGGACAACTA TAACGACACC TCCCTGGTGG AAAATCATCT CTGCCCTGCC 12 0 

ACAGAGGGTC CCCTCATGGC CTCCTTCAAG GCCGTGTTCG TGCCCGTGGC CTACAGCCTC 18 0 

ATCTTCCTCC TGGGCGTGAT CGGCAACGTC CTGGTGCTGG TGATCCTGGA GCGGCACCGG 2 40 

C AG AC ACG C A GTTCCACGGA GACCTTCCTG TTCCACCTGG CCGTGGCCGA CCTCCTGCTG 3 00 

15 GTCTTCATCT TGCCCTTTGC CGTGGCCGAG GGCTCTGTGG GCTGGGTCCT GGGG AC CTTC 3 60 

CTCTGCAAAA CTGTGATTGC CCTGCACAAA GTCAACTTCT ACTG C AG C AG CCTGCTCCTG 42 0 

GCCTGCATCG CCGTGGACCG CTACCTGGCC ATTGTCCACG CCGTCCATGC CTACCGCCAC 4 80 

CGCCGCCTCC TCTCCATCCA CATCACCTGT GGGACCATCT GGCTGGTGGG CTTCCTCCTT 54 0 

GCCTTGCCAG AGATTCTCTT CGCCAAAGTC AGCCAAGGCC ATCACAACAA CTCCCTGCCA 6 00 

20 CGTTGC AC CT TCTCCCAAGA GAACCAAGCA GAAACGCATG CCTGGTTCAC CTCCCGATTC 66 0 

CTCTACCATG TGGCGGGATT CCTGCTGCCC ATGCTGGTGA TGGG CTGGTG CTACGTGGGG 72 0 

GTAGTGCACA GGTTGCGCCA GGCCCAGCGG CGCCCTCAGC GGCAGAAGGC AGTCAGGGTG 78 0 

GCCATCCTGG TGACAAGCAT CTTCTTCCTC TGCTGGTCAC CCTACCACAT CGTCATCTTC 84 0 

CTGGACACCC TGGCGAGGCT GAAGGCCGTG GACAATACCT GCAAG CTGAA TGGCTCTCTC 900 

25 CCCGTGGCCA TCACCATGTG TGAGTTCCTG GGCCTGGCCC ACTGCTGCCT CAACCCCATG 96 0 

CTCTACACTT TCGCCGGCGT GAAGTTCCGC AGTGACCTGT CGCGGCTCCT GACCAAGCTG 102 0 

GGCTGTACCG GCCCTGCCTC CCTGTGCCAG CTCTTCCCTA GCTGGCGCAG GAGCAGTCTC 108 0 

TCTGAGTCAG AGAATGCCAC CTCTCTCACC ACGTTCTAG 1119 

(67) INFORMATION FOR SEQ ID NO: 66: 
30 (i) SEQUENCE CHARACTERISTICS: 
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(A) LENGTH: 372 amino acids 

(B) TYPE: amino acid 

(C) STRANDEDNESS : 

(D) TOPOLOGY: not relevant 

5 (ii) MOLECULE TYPE: protein 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO:66: 

Met Asn Tyr Pro Leu Thr Leu Glu Met Asp Leu Glu Asn Leu Glu Asp 
15 10 15 

Leu Phe Trp Glu Leu Asp Arg Leu Asp Asn Tyr Asn Asp Thr Ser Leu 
10 20 25 30 

Val Glu Asn His Leu Cys Pro Ala Thr Glu Gly Pro Leu Met Ala Ser 
35 40 45 

Phe Lys Ala Val Phe Val Pro Val Ala Tyr Ser Leu lie Phe Leu Leu 
50 55 60 

15 Gly Val lie Gly Asn Val Leu Val Leu Val lie Leu Glu Arg His Arg 

65 70 75 80 

Gin Thr Arg Ser Ser Thr Glu Thr Phe Leu Phe His Leu Ala Val Ala 
85 90 95 

Asp Leu Leu Leu Val Phe lie Leu Pro Phe Ala Val Ala Glu Gly Ser 
20 100 105 no 

Val Gly Trp Val Leu Gly Thr Phe Leu Cys Lys Thr Val lie Ala Leu 
115 120 125 

His Lys Val Asn Phe Tyr Cys Ser Ser Leu Leu Leu Ala Cys lie Ala 
130 135 140 

25 Val Asp Arg Tyr Leu Ala lie Val His Ala Val His Ala Tyr Arg His 

145 150 155 160 

Arg Arg Leu Leu Ser lie His lie Thr Cys Gly Thr lie Trp Leu Val 
165 170 175 

Gly Phe Leu Leu Ala Leu Pro Glu lie Leu Phe Ala Lys Val Ser Gin 
30 180 185 190 

Gly His His Asn Asn Ser Leu Pro Arg Cys Thr Phe Ser Gin Glu Asn 
195 200 205 

Gin Ala Glu Thr His Ala Trp Phe Thr Ser Arg Phe Leu Tyr His Val 
210 215 220 

35 Ala Gly Phe Leu Leu Pro Met Leu Val Met Gly Trp Cys Tyr Val Gly 

225 230 235 240 

Val Val His Arg Leu Arg Gin Ala Gin Arg Arg Pro Gin Arg Gin Lys 
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245 250 255 

Ala Val Arg Val Ala He Leu Val Thr Ser He Phe Phe Leu Cys Trp 
260 265 270 

Ser Pro Tyr His He Val He Phe Leu Asp Thr Leu Ala Arg Leu Lys 
5 275 280 285 

Ala Val Asp Asn Thr Cys Lys Leu Asn Gly Ser Leu Pro Val Ala He 
290 295 300 

Thr Met Cys Glu Phe Leu Gly Leu Ala His Cys Cys Leu Asn Pro Met 
305 310 315 320 

10 Leu Tyr Thr Phe Ala Gly Val Lys Phe Arg Ser Asp Leu Ser Arg Leu 

325 330 335 

Leu Thr Lys Leu Gly Cys Thr Gly Pro Ala Ser Leu Cys Gin Leu Phe 
340 345 350 

Pro Ser Trp Arg Arg Ser Ser Leu Ser Glu Ser Glu Asn Ala Thr Ser 
15 355 360 365 

Leu Thr Thr Phe 
370 

(68) INFORMATION FOR SEQ ID NO; 67: 

(i) SEQUENCE CHARACTERISTICS: 
20 (A) LENGTH: 30 base pairs 

(B) TYPE: nucleic acid 
<C) STRANDEDNESS : single 
<D) TOPOLOGY: linear 

<ii) MOLECULE TYPE: DNA (genomic) 

25 (xi) SEQUENCE DESCRIPTION: SEQ ID NO: 67: 

CAAAGCTTGA AAGCTGCACG GTGCAGAGAC 30 

(69) INFORMATION FOR SEQ ID NO: 68: 

(i) SEQUENCE CHARACTERISTICS: 
(A) LENGTH: 30 base pairs 
30 (B) TYPE: nucleic acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO:68: 
35 GCGGATCCCG AGTCACACCC TGGCTGGGCC 3 0 

(70) INFORMATION FOR SEQ ID NO: 69: 
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(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 1128 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY : linear 

(ii) MOLECULE TYPE: DNA (genomic) 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 69: 

ATGGATGTGA CTTCCCAAGC CCGGGGCGTG GGCCTGGAGA TGTACCCAGG CACCGCGCAG 60 

CCTGCGGCCC CCAACACCAC CTCCCCCGAG CTCAACCTGT CCCACCCGCT CCTGGGCACC 12 0 

GCCCTGGCCA ATGGGACAGG TGAGCTCTCG GAGCACCAGC AGTACGTGAT CGGCCTGTTC 18 0 

CTCTCGTGCC TCTACACCAT CTTCCTCTTC CCCATCGGCT TTGTGGGCAA CATCCTGATC 24 0 

CTGGTGGTGA ACATCAGCTT CCGCGAGAAG ATGACCATCC CCGACCTGTA CTTCATCAAC 3 00 

CTGGCGGTGG CGGAC CTC AT CCTGGTGGCC GACTCCCTCA TTGAGGTGTT CAACCTGCAC 36 0 

GAGCGGTACT ACGACATCGC CGTCCTGTGC ACCTTCATGT CGCTCTTCCT GCAGGTCAAC 42 0 

ATGT AC AG C A GCGTCTTCTT CCTCACCTGG ATGAG CTTCG ACCGCTACAT CGCCCTGGCC 48 0 

AGGGCCATGC GCTGCAGCCT GTTCCGCACC AAGCACCACG CCCGGCTGAG CTGTGGCCTC 54 0 

ATCTGGATGG CATCCGTGTC AGCCACGCTG GTGCCCTTCA CCGCCGTGCA CCTGCAGCAC 60 0 

ACCGACGAGG CCTGCTTCTG TTTCG CGGAT GTCCGGGAGG TGCAGTGGCT CGAGGTCACG 66 0 

CTGGGCTTCA TCGTGCCCTT CGCCATCATC GGCCTGTGCT ACTCCCTCAT TGTCCGGGTG 72 0 

CTGGTCAGGG CGCACCGGCA CCGTGGGCTG CGGCCCCGGC GGCAGAAGGC GCTCCGCATG 78 0 

ATCCTCGCGG TGGTGCTGGT CTTCTTCGTC TGCTGGCTGC CGGAGAACGT CTTCATCAGC 84 0 

GTGCACCTCC TGCAGCGGAC GCAGCCTGGG GCCGCTCCCT GCAAGCAGTC TTTCCGCCAT 900 

GCCCACCCCC TCACGGGCCA CATTGTCAAC CTCACCGCCT TCTCCAACAG CTGCCTAAAC 960 

CCCCTCATCT ACAGCTTTCT CGGGGAGACC TTCAGGGACA AGCTGAGGCT GTACATTGAG 102 0 

CAGAAAACAA ATTTGCCGGC CCTGAACCGC TTCTGTCACG CTGCCCTGAA GGCCGTCATT 108 0 

CCAGACAGCA CCGAGCAGTC GGATGTGAGG TTCAGCAGTG CCGTGTAG 1128 
(71) INFORMATION FOR SEQ ID NO: 70: 

<i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 3 75 amino acids 

(B) TYPE: amino acid 

(C) STRANDEDNESS: 

(D) TOPOLOGY: not relevant 
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(ii) MOLECULE TYPE : protein 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 70: 

Met Asp Val Thr Ser Gin Ala Arg Gly Val Gly Leu Glu Met Tyr Pro 
15 10 15 

5 Gly Thr Ala Gin Pro Ala Ala Pro Asn Thr Thr Ser Pro Glu Leu Asn 

20 25 30 

Leu Ser His Pro Leu Leu Gly Thr Ala Leu Ala Asn Gly Thr Gly Glu 
35 40 45 

Leu Ser Glu His Gin Gin Tyr Val lie Gly Leu Phe Leu Ser Cys Leu 
10 50 55 60 

Tyr Thr lie Phe Leu Phe Pro lie Gly Phe Val Gly Asn lie Leu lie 
65 70 75 80 

Leu Val Val Asn lie Ser Phe Arg Glu Lys Met Thr lie Pro Asp Leu 
85 90 95 

15 Tyr Phe lie Asn Leu Ala Val Ala Asp Leu lie Leu Val Ala Asp Ser 

100 105 110 

Leu lie Glu Val Phe Asn Leu His Glu Arg Tyr Tyr Asp lie Ala Val 
115 120 125 

Leu Cys Thr Phe Met Ser Leu Phe Leu Gin Val Asn Met Tyr Ser Ser 
20 130 135 140 

Val Phe Phe Leu Thr Trp Met Ser Phe Asp Arg Tyr lie Ala Leu Ala 
145 150 155 160 

Arg Ala Met Arg Cys Ser Leu Phe Arg Thr Lys His His Ala Arg Leu 
165 170 175 

25 Ser Cys Gly Leu lie Trp Met Ala Ser Val Ser Ala Thr Leu Val Pro 

180 185 190 

Phe Thr Ala Val His Leu Gin His Thr Asp Glu Ala Cys Phe Cys Phe 
195 200 205 

Ala Asp Val Arg Glu Val Gin Trp Leu Glu Val Thr Leu Gly Phe lie 
30 210 215 220 

Val Pro Phe Ala lie lie Gly Leu Cys Tyr Ser Leu lie Val Arg Val 
225 230 235 240 

Leu Val Arg Ala His Arg His Arg Gly Leu Arg Pro Arg Arg Gin Lys 
245 250 255 

35 Ala Leu Arg Met lie Leu Ala Val Val Leu Val Phe Phe Val Cys Trp 

260 265 270 
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Leu Pro Glu Asn Val Phe lie Ser Val His Leu Leu Gin Arg Thr Gin 
275 280 2 85 

Pro Gly Ala Ala Pro Cys Lys Gin Ser Phe Arg His Ala His Pro Leu 
29C 295 300 

5 Thr Gly His He Val Asn Leu Thr Ala Phe Ser Asn Ser Cys Leu Asn 

305 310 315 320 

Pro Leu He Tyr Ser Phe Leu Gly Glu Thr Phe Arg Asp Lys Leu Arg 
325 330 335 

Leu T y r Ile Gin Lys Thr Asn Leu Pro Ala Leu Asn Arg Phe Cys 

10 34 ° 345 350 

His Ala Ala Leu Lys Ala Val Ile Pro Asp Ser Thr Glu Gin Ser Asp 
355 360 365 

Val Arg Phe Ser Ser Ala Val 
370 375 

15 (72) INFORMATION FOR SEQ ID NO: 71: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 3 0 base pairs 

(B) TYPE: nucleic acid 
<C) STRANDEDNESS : single 

20 ( d ) TOPOLOGY : 1 inear 

(ii) MOLECULE TYPE: DNA (genomic) 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 71: 
AC AGAATT C C TGTGTGGTTT TACCGCCCAG 
(73) INFORMATION FOR SEQ ID NO: 72: 

25 (i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 3 0 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 



30 



35 



30 



(ii) MOLECULE TYPE: DNA (genomic) 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO:72: 
CTCGGATCCA GGCAGAAGAG TCGCCTATGG 30 
(74) INFORMATION FOR SEQ ID NO: 73: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 1137 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : single 
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(D) TOPOLOGY: linear 
(ii) MOLECULE TYPE: DNA (genomic) 
(xi) SEQUENCE DESCRIPTION: SEQ ID NO:73: 

ATGGACCTGG GGAAACCAAT GAAAAGCGTG CTGGTGGTGG CTCTCCTTGT CATTTTCCAG 6 0 

5 GTATGCCTGT GTCAAGATGA GGTCACGGAC GATTACATCG GAGACAACAC CACAGTGGAC 12 0 

• TACACTTTGT TCGAGTCTTT GTGCTCCAAG AAGGACGTGC GGAACTTTAA AGCCTGGTTC 180 

CTCCCTATCA TGTACTCCAT CATTTGTTTC GTGGGCCTAC TGGGCAATGG GCTGGTCGTG 24 0 

TTGACCTATA TCTATTTCAA GAGGCTCAAG AC C ATGACCG ATACCTACCT GCTCAACCTG 300 

GCGGTGGCAG ACATCCTCTT CCTCCTGACC CTTCCCTTCT GGG C CT AC AG CGCGGCCAAG 36 0 

10 TCCTGGGTCT TCGGTGTCCA CTTTTGCAAG CTCATCTTTG CCATCTACAA GATGAGCTTC 42 0 

TTCAGTGGCA TGCTCCTACT TCTTTG CATC AGCATTGACC GCTACGTGGC CATCGTCCAG 48 0 

GCTGTCTCAG CTCACCGCCA CCGTGCCCGC GTCCTTCTCA TCAGCAAGCT GTCCTGTGTG 54 0 

GGCATCTGGA TACTAG CC AC AGTGCTCTCC ATCCCAGAGC TCCTGTACAG TGACCTCCAG 6 00 

AGGAGCAGCA GTGAGCAAGC GATGCGATGC TCTCTCATCA CAGAGCATGT GGAGGCCTTT 66 0 

15 ATCACCATCC AGGTGGCC C A GATGGTGATC GGCTTTCTGG TCCCCCTGCT GGCCATGAGC 72 0 

TTCTGTTACC TTGTCATCAT CCGCACCCTG CTCCAGGCAC GCAACTTTGA GCGCAACAAG 78 0 

GCCATCAAGG TGATCATCGC TGTGGTCGTG GTCTTCATAG TCTTCCAGCT GCCCTACAAT 84 0 

GGGGTGGTCC TGGCCCAGAC GGTGGCCAAC TTCAACATCA CC AGTAG C AC CTGTGAGCTC 900 

AGTAAGCAAC TCAACATCGC CTACGACGTC ACCTACAGCC TGGCCTGCGT CCGCTGCTGC 96 0 

20 GTCAACCCTT TCTTGTACGC CTTCATCGGC GTCAAGTTCC GCAACGATCT CTTCAAGCTC 102 0 

TTCAAGGACC TGGGCTGCCT CAG CCAGGAG CAGCTCCGGC AGTGGTCTTC CTGTCGGCAC 108 0 

ATCCGGCGCT CCTCCATGAG TGTGGAGGCC GAGACCACCA CCACCTTCTC CCCATAG 113 7 
(75) INFORMATION FOR SEQ ID NO: 74: 

(i) SEQUENCE CHARACTERISTICS: 
25 (A) LENGTH: 378 amino acids 

(B) TYPE: amino acid 

(C) STRANDEDNESS : 

(D) TOPOLOGY: not relevant 

(ii) MOLECULE TYPE: protein 

30 (xi) SEQUENCE DESCRIPTION: SEQ ID NO: 74: 
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Met Asp Leu Gly Lys Pro Met Lys Ser Val Leu Val Val Ala Leu Leu 
15 10 15 

Val lie Phe Gin Val Cys Leu Cys Cln Asp Glu Val Thr Asp Asp Tyr 
20 25 30 

lie Gly Asp Asn Thr Thr Val Asp Tyr Thr Leu Phe Glu Ser Leu Cys 
35 40 45 

Ser Lys Lys Asp Val Arg Asn Phe Lys Ala Trp Phe Leu Pro lie Met 
50 55 60 

Tyr Ser lie lie Cys Phe Val Gly Leu Leu Gly Asn Gly Leu Val Val 
65 70 75 80 

Leu Thr Tyr lie Tyr Phe Lys Arg Leu Lys Thr Met Thr Asp Thr Tyr 
85 90 95 

. Leu Leu Asn Leu Ala Val Ala Asp lie Leu Phe Leu Leu Thr Leu Pro 
100 105 110 

Phe Trp Ala Tyr Ser Ala Ala Lys Ser Trp Val Phe Gly Val His Phe 
115 120 125 

Cys Lys Leu lie Phe Ala lie Tyr Lys Met Ser Phe Phe Ser Gly Met 
130 135 140 

Leu Leu Leu Leu Cys He Ser He Asp Arg Tyr Val Ala He Val Gin 
145 150 155 160 

Ala Val Ser Ala His Arg His Arg Ala Arg Val Leu Leu lie Ser Lys 
165 170 175 

Leu Ser Cys Val Gly lie Trp He Leu Ala Thr Val Leu Ser He Pro 
180 185 190 

Glu Leu Leu Tyr Ser Asp Leu Gin Arg Ser Ser Ser Glu Gin Ala Met 
195 200 205 

Arg Cys Ser Leu He Thr Glu His Val Glu Ala Phe He Thr He Gin 
210 215 220 

Val Ala Gin Met Val He Gly Phe Leu Val Pro Leu Leu Ala Met Ser 
225 230 235 240 

Phe Cys Tyr Leu Val He He Arg Thr Leu Leu Gin Ala Arg Asn Phe 
245 250 255 

Glu Arg Asn Lys Ala He Lys Val He He Ala Val Val Val Val Phe 
260 265 270 



He Val Phe Gin Leu Pro Tyr Asn Gly Val Val Leu Ala Gin Thr Val 
275 280 285 
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Ala Asn Phe Asn lie Thr Ser Ser Thr Cys Glu Leu Ser Lys Gin Leu 
290 295 300 

Asn lie Ala Tyr Asp Val Thr Tyr Ser Leu Ala Cys Val Arg Cys Cys 
305 310 315 320 

5 Val Asn Pro Phe Leu Tyr Ala Phe lie Gly Val Lys Phe Arg Asn Asp 

325 330 335 

Leu Phe Lys Leu Phe Lys Asp Leu Gly Cys Leu Ser Gin Glu Gin Leu 
340 345 350 

Arg Gin Trp Ser Ser Cys Arg His lie Arg Arg Ser Ser Met Ser Val 
10 355 360 365 

Glu Ala Glu Thr Thr Thr Thr Phe Ser Pro 
370 375 

(76) INFORMATION FOR SEQ ID NO: 75: 

( i ) SEQUENCE CHARACTERISTICS : 
15 (A) LENGTH: 32 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 

(ii> MOLECULE TYPE: DNA (genomic) 
20 (xi) SEQUENCE DESCRIPTION: SEQ ID NO: 75: 

CTGGAATTCA CCTGGACCAC CACCAATGGA TA 3 2 

(77) INFORMATION FOR SEQ ID NO: 76: 

(i) SEQUENCE CHARACTERISTICS: 
(A) LENGTH: 3 0 base pairs 

25 (B) TYPE: nucleic acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO.-76: 
30 CTCGGATCCT GCAAAGTTTG TCATACAGTT 3 0 

(78) INFORMATION FOR SEQ ID NO: 77: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 1085 base pairs 

(B) TYPE: nucleic acid 
35 (C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 
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(xi) SEQUENCE DESCRIPTION: SEQ ID NO : 77 ; 

ATGGATATAC AAATGGCAAA CAATTTTACT CCGCCCTCTG CAACTCCTCA GGGAAATGAC 6 0 

TGTGACCTCT ATGCACATCA CAGCACGGCC AGGATAGTAA TGCCTCTGCA TTACAGCCTC 12 0 

GTCTTCATCA TTGGGCTCGT GGGAAACTTA CTAGCCTTGG TCGTCATTGT TCAAAACAGG 180 

5 AAAAAAATCA ACTCTACCAC CCTCTATTCA ACAAATTTGG TGATTTCTGA TATACTTTTT 24 0 

ACCACGGCTT TGCCTACACG AATAGC CTAC TATGCAATGG GCTTTGACTG GAGAATCGGA 3 00 

GATGCCTTGT GTAGGATAAC TGCGCTAGTG TTTTACATCA AC ACATATG C AGGTGTGAAC 360 

TTTATGACCT GCCTGAGTAT TGACCGCTTC ATTGCTGTGG TGCACCCTCT ACGCTACAAC 42 0 

AAGATAAAAA GGATTGAACA TGCAAAAGGC GTGTGCATAT TTGTCTGGAT TCTAGTATTT 480 

10 GCTCAGACAC TCCCACTCCT CATCAACCCT ATGTCAAAGC AGGAGG CTG A AAGGATTACA 54 0 

TGCATGGAGT ATCCAAACTT TGAAGAAACT AAATCTCTTC CCTGGATTCT GCTTGGGGCA 6 00 

TGTTTCATAG GATATGTACT TCCACTTATA AT CATTCTCA TCTGCTATTC TCAGATCTGC 6 60 

TGCAAACTCT TCAGAACTGC CAAACAAAAC CCACTCACTG AGAAATCTGG TGTAAACAAA 72 0 

AAGGCTCTCA ACACAATTAT TCTTATTATT GTTGTGTTTG TTCTCTGTTT CACACCTTAC 7 80 

15 CATGTTGCAA TTATTCAACA TATGATTAAG AAGCTTCGTT TCTCTAATTT CCTGGAATGT 84 0 

AGCCAAAGAC ATTCGTTCCA GATTTCTCTG CACTTTACAG TATG CCTGAT GAACTTCAAT 900 

TGCTGCATGG ACCCTTTTAT CTACTTCTTT GCATGTAAAG GGTATAAGAG AAAG GTTATG 96 0 

AGGATGCTGA AACGGCAAGT CAGTGTATCG ATTTCTAGTG CTGTGAAGTC AGCCCCTGAA 102 0 

G AAAATT C AC GTGAAATGAC AGAAACGCAG ATGATGATAC ATTCCAAGTC TTCAAATGGA 108 0 

20 AAGTGA 1086 
(79) INFORMATION FOR SEQ ID NO: 78: 

<i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 361 amino acids 

(B) TYPE: amino acid 
25 (C) STRANDEDNESS : 

<D) TOPOLOGY: not relevant 

(ii) MOLECULE TYPE: protein 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 78: 

Met Asp lie Gin Met Ala Asn Asn Phe Thr Pro Pro Ser Ala Thr Pro 
30 l 5 10 15 
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Gin Gly Asn Asp Cys Asp Leu Tyr Ala His His Ser Thr Ala Arg lie 
20 25 30 

Val Met Pro Leu His Tyr Ser Leu Val Phe lie lie Gly Leu Val Gly 
35 40 45 

5 Asn Leu Leu Ala Leu Val Val lie Val Gin Asn Arg Lys Lys lie Asn 

50 55 60 

Ser Thr Thr Leu Tyr Ser Thr Asn Leu Val lie Ser Asp lie Leu Phe 
65 70 75 80 

Thr Thr Ala Leu Pro Thr Arg lie Ala Tyr Tyr Ala Met Gly Phe Asp 
10 85 90 95 

Trp Arg lie Gly Asp Ala Leu Cys Arg lie Thr Ala Leu Val Phe Tyr 
100 105 110 

lie Asn Thr Tyr Ala Gly Val Asn Phe Met Thr Cys Leu Ser lie Asp 
115 120 125 

15 Arg Phe lie Ala Val Val His Pro Leu Arg Tyr Asn Lys lie Lys Arg 

130 135 140 

lie Glu His Ala Lys Gly Val Cys lie Phe Val Trp lie Leu Val Phe 
145 150 155 160 

Ala Gin Thr Leu Pro Leu Leu lie Asn Pro Met Ser Lys Gin Glu Ala 
20 165 170 175 

Glu Arg lie Thr Cys Met Glu Tyr Pro Asn Phe Glu Glu Thr Lys Ser 
180 185 190 

Leu Pro Trp lie Leu Leu Gly Ala Cys Phe He Gly Tyr Val Leu Pro 
195 200 205 

25 Leu He He He Leu He Cys Tyr Ser Gin He Cys Cys Lys Leu Phe 

210 215 220 

Arg Thr Ala Lys Gin Asn Pro Leu Thr Glu Lys Ser Gly Val Asn Lys 
225 230 235 240 

Lys Ala Leu Asn Thr He He Leu He He Val Val Phe Val Leu Cys 
30 245 250 255 

Phe Thr Pro Tyr His Val Ala He He Gin His Met He Lys Lys Leu 
260 265 270 

Arg Phe Ser Asn Phe Leu Glu Cys Ser Gin Arg His Ser Phe Gin He 
275 280 285 

35 Ser Leu His Phe Thr Val Cys Leu Met Asn Phe Asn Cys Cys Met Asp 

290 295 300 

Pro Phe He Tyr Phe Phe Ala Cys Lys Gly Tyr Lys Arg Lys Val Met 



BNSDOCID: <WO 0022 129 A 1 J _ > 



WO 00/22 1 29 PCT/US99/23938 

62 



305 310 315 



320 



Arg Met Leu Lys Arg Gin Val Ser Val Ser lie Ser Ser Ala Val Lys 
325 330 335 

Ser Ala Pro Glu Glu Asn Ser Arg Glu Met Thr Glu Thr Gin Met Met 
340 345 350 

lie His Ser Lys Ser Ser Asn Gly Lys 
355 360 



(80) INFORMATION FOR SEQ ID NO: 79: 



(i) SEQUENCE CHARACTERISTICS: 
10 (A) LENGTH: 31 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 

15 (xi) SEQUENCE DESCRIPTION: SEQ ID NO: 79: 

CTGGAATTCT CCTGCTCATC CAGCCATGCG G 
(81) INFORMATION FOR SEQ ID NO: 80: 



(i) SEQUENCE CHARACTERISTICS: 
(A) LENGTH: 3 0 base pairs 
20 (B) TYPE: nucleic acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY : linear 



(ii) MOLECULE TYPE: DNA (genomic) 
(xi) SEQUENCE DESCRIPTION: SEQ ID NO:80: 
25 CCTGGATCCC CACCCCTACT GGGGCCTCAG 

(82) INFORMATION FOR SEQ ID NO: 81: 



(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 1446 base pairs 

(B) TYPE: nucleic acid 
30 (C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 81: 



ATGCGGTGGC TGTGGCCCCT GGCTGTCTCT CTTGCTGTGA TTTTGGCTGT GGGGCTAAGC 
35 AGGGTCTCTG GGGGTGCCCC CCTGCACCTG GGCAGGCACA GAGCCGAGAC CCAGGAGCAG 
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CAGAGCCGAT CCAAGAGGGG CACCGAGGAT GAGGAGGCCA AGGGCGTGCA GCAGTATGTG 180 

CCTGAGGAGT GGGCGGAGTA CCCCCGGCCC ATTCA.CCCTG CTGGCCTGCA GCCAACCAAG 24 0 

CCCTTGGTGG CCACCAGCCC TAACCCCGAC AAGGATGGGG GCACCCCAGA CAGTGGGCAG 3 00 

GAACTGAGGG GCAATCTGAC AGGGGCACCA GGGCAGAGGC TACAGATCCA GAACCCCCTG 3 60 

5 TATCCGGTGA CCGAGAGCTC CTACAGTGCC TATGCCATCA TGCTTCTGGC GCTGGTGGTG 42 0 

TTTG CGGTGG GCATTGTGGG CAACCTGTCG GTCATGTGCA TCGTGTGGCA CAGCTACTAC 480 

CTGAAGAGCG CCTGGAACTC CATCCTTGCC AGCCTGGCCC TCTGGGATTT TCTGGTCCTC 54 0 

TTTTTCTGCC TCCCTATTGT CATCTTCAAC GAGATCACCA AGCAGAGGCT AC TGGGTG AC 6 00 

GTTTCTTGTC GTGCCGTGCC CTT CATGGAG GTCTCCTCTC TGGGAGTCAC G ACTTTC AG C 660 

10 CTCTGTGCCC TGGGCATTGA CCGCTTCCAC GTGGCCACCA GCACCCTGCC CAAGGTGAGG 72 0 

CCCATCGAGC GGTGCCAATC CATCCTGGCC AAGTTGGCTG TCATCTGGGT GGGCTCCATG 780 

ACGCTGGCTG TGCCTGAGCT CCTGCTGTGG CAGCTGGCAC AGGAGCCTGC CCCCACCATG 84 0 

GGCACCCTGG ACTCATGCAT CATGAAACCC TCAGCCAGCC TGCCCGAGTC CCTGTATTCA 9 00 

CTGGTGATGA CCTACCAGAA CGCCCGCATG TGGTGGTACT TTGGCTGCTA CTTCTGCCTG 96 0 

15 CCCATCCTCT TCACAGTCAC CTGCCAGCTG GTGACATGGC GGGTGCGAGG CCCTCCAGGG X020 

AGGAAGTCAG AGTGCAGGGC CAGCAAGCAC GAGCAGTGTG AGAGCCAGCT CAACAGCACC 1080 

GTGGTGGGCC TGACCGTGGT CTACGCCTTC TGCACCCTCC CAGAGAACGT CTGCAACATC 114 0 

GTGGTGGCCT ACCTCTCCAC CGAGCTGACC CGCCAGACCC TGGACCTCCT GGGCCTCATC 12 00 

AACCAGTTCT CCACCTTCTT CAAGGGCGCC ATCACCCCAG TGCTGCTCCT TTGCATCTGC 126 0 

20 AGGCCGCTGG GCCAGGCCTT CCTGGACTGC TGCTGCTGCT GCTGCTGTGA GGAGTGCGGC 132 0 

GGGGCTTCGG AGGCCTCTGC TGCCAATGGG TCGGACAACA AGCTCAAGAC CGAGGTGTCC 13 8 0 

TCTTCCATCT ACTTCCACAA GCCCAGGGAG TCACCCCCAC TCCTGCCCCT GGGCACACCT 1440 

TGCTGA 1446 

(83) INFORMATION FOR SEQ ID NO: 82: 

25 (i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 481 amino acids 
<B) TYPE: amino acid 

(C) STRANDEDNESS : 

(D) TOPOLOGY: not relevant 

30 (ii) MOLECULE TYPE: protein 



BNSDOCID: <WO 0022129A1.I. > 



WO 00/22129 



PCT/US99/23938 



10 



64 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 82: 

Met Arg Trp Leu Trp Pro Leu Ala Val Ser Leu Ala Val lie Leu Ala 
15 10 is 

Val Gly Leu Ser Arg Val Ser Gly Gly Ala Pro Leu His Leu Gly Arg 
20 25 30 

His Arg Ala Glu Thr Gin Glu Gin Gin Ser Arg Ser Lys Arg Gly Thr 
35 40 45 

Glu Asp Glu Glu Ala Lys Gly Val Gin Gin Tyr Val Pro Glu Glu Trp 
50 55 60 

Ala Glu Tyr Pro Arg Pro lie His Pro Ala Gly Leu Gin Pro Thr Lys 
65 70 75 80 

Pro Leu Val Ala Thr Ser Pro Asn Pro Asp Lys Asp Gly Gly Thr Pro 
85 90 95 

Asp Ser Gly Gin Glu Leu Arg Gly Asn Leu Thr Gly Ala Pro Gly Gin 
15 100 105 no 

Arg Leu Gin lie Gin Asn Pro Leu Tyr Pro Val Thr Glu Ser Ser Tyr 
115 120 125 

Ser Ala Tyr Ala lie Met Leu Leu Ala Leu Val Val Phe Ala Val Gly 
130 135 140 

20 lie Val Gly Asn Leu Ser Val Met Cys lie Val Trp His Ser Tyr Tyr 

145 150 155 160 

Leu Lys Ser Ala Trp Asn Ser He Leu Ala Ser Leu Ala Leu Trp Asp 
165 170 175 

Phe Leu Val Leu Phe Phe Cys Leu Pro He Val lie Phe Asn Glu lie 
180 185 190 

Thr Lys Gin Arg Leu Leu Gly Asp Val Ser Cys Arg Ala Val Pro Phe 
195 200 205 

Met Glu Val Ser Ser Leu Gly Val Thr Thr Phe Ser Leu Cys Ala Leu 
210 215 220 

Gly lie Asp Arg Phe His Val Ala Thr Ser Thr Leu Pro Lys Val Arg 
225 230 235 240 

Pro lie Glu Arg Cys Gin Ser He Leu Ala Lys Leu Ala Val lie Trp 
245 250 255 

Val Gly Ser Met Thr Leu Ala Val Pro Glu Leu Leu Leu Trp Gin Leu 
35 260 265 270 



25 



30 



Ala Gin Glu Pro Ala Pro Thr Met Gly Thr Leu Asp Ser Cys II 



e Met 
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275 280 285 

Lys Pro Ser Ala Ser Leu Pro GIu Ser Leu Tyr Ser Leu Val Met Thr 
290 295 300 

Tyr Gin Asn Ala Arg Met Trp Trp Tyr Phe Gly Cys Tyr Phe Cys Leu 
5 305 310 315 320 

Pro lie Leu Phe Thr Val Thr Cys Gin Leu Val Thr Trp Arg Val Arg 
325 330 335 

Gly Pro Pro Gly Arg Lys Ser Glu Cys Arg Ala Ser Lys His Glu Gin 
340 345 350 

10 Cys Glu Ser Gin Leu Asn Ser Thr Val Val Gly Leu Thr Val Val Tyr 

355 360 365 

Ala Phe Cys Thr Leu Pro Glu Asn Val Cys Asn lie Val Val Ala Tyr 
370 375 380 

Leu Ser Thr Glu Leu Thr Arg Gin Thr Leu Asp Leu Leu Gly Leu lie 
15 385 390 395 400 

Asn Gin Phe Ser Thr Phe Phe Lys Gly Ala lie Thr Pro Val Leu Leu 
405 410 415 

Leu Cys He Cys Arg Pro Leu Gly Gin Ala Phe Leu Asp Cys Cys Cys 
420 425 430 

20 Cys Cys Cys Cys Glu Glu Cys Gly Gly Ala Ser Glu Ala Ser Ala Ala 

435 440 445 

Asn Gly Ser Asp Asn Lys Leu Lys Thr Glu Val Ser Ser Ser He Tyr 
450 455 460 

Phe His Lys Pro Arg Glu Ser Pro Pro Leu Leu Pro Leu Gly Thr Pro 
25 465 470 475 480 

Cys 



(84) INFORMATION FOR SEQ ID NO: 83: 

(i) SEQUENCE CHARACTERISTICS: 
30 (A) LENGTH: 22 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : single 

( D ) TOPOLOGY : 1 inear 

(ii) MOLECULE - TYPE : DNA (genomic) 

35 (xi) SEQUENCE DESCRIPTION: SEQ ID NO:83: 

ATGTGGAACG CGACGCCCAG CG 
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(8 5) INFORMATION FOR SEQ ID NO: 84: 



(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 22 base pairs 

(B) TYPE: nucleic acid 

5 (C) STRANDEDNESS : single 

( D ) TOPOLOGY : 1 inear 



(ii) MOLECULE TYPE: DNA (genomic) 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 84 



TCATGTATTA ATACTAGATT CT 
10 (86) INFORMATION FOR SEQ ID NO: 85: 



(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 3 8 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 
15 (D) TOPOLOGY: linear 



(ii) MOLECULE TYPE: DNA (genomic) 
(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 85 
TACCATGTGG AACGCGACGC C CAGCGAAG A GCCGGGGT 



(87) INFORMATION FOR SEQ ID NO: 86: 

20 (i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 3 9 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY; linear 



25 (ii) MOLECULE TYPE: DNA (genomic) 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 86: 
CGGAATTCAT GTATTAATAC TAGATTCTGT CCAGGCCCG 
(88) INFORMATION FOR SEQ ID NO: 87: 



(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 1101 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY : linear 



(ii) MOLECULE TYPE: DNA (genomic) 
35 (xi) SEQUENCE DESCRIPTION: SEQ ID NO: 87: 
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38 
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ATGTGGAACG CGACGCCCAG CGAAGAGCCG GGGTTCAACC TCACACTGGC CGACCTGGAC 6 0 

TGGGATGCTT CCCCCGGCAA CGACTCGCTG GGCGACGAGC TGCTGCAGCT CTTCCCCGCG 12 0 

CCGCTGCTGG CGGGCGTCAC AGCCACCTGC GTGGCACTCT TCGTGGTGGG TATCGCTGGC 18 0 

AACCTGCTCA CCATGCTGGT GGTGTCGCGC TTCCGCGAGC TGCGCACCAC CACCAACCTC 24 0 

5 TACCTGTCCA GCATGGCCTT CTCCGATCTG CTCATCTTCC TCTGCATGCC CCTGGACCTC 3 00 

GTTCGCCTCT GGCAGTACCG GCCCTGGAAC TTCGGCGACC TCCTCTGCAA ACTCTTCCAA 3 60 

TTCGTCAGTG AGAGCTG CAC CTACGCCACG GTG CTCACCA TCACAGCGCT GAGCGTCGAG 42 0 

CGCTACTTCG CCATCTGCTT CCCACTCCGG GCCAAGGTGG TGGTCACCAA GGGGCGGGTG 4 80 

AAGCTGGTCA TCTTCGTCAT CTGGGCCGTG GCCTTCTGCA GCGCCGGGCC CATCTTCGTG 54 0 

10 CTAGTCGGGG TGGAGCACGA GAACGGCACC GAC CCTTGGG ACACCAACGA GTGCCGCCCC 6 00 

ACCGAGTTTG CGGTGCGCTC TGGACTGCTC ACGGTCATGG TGTGGGTGTC C AG CAT CTT C 66 0 

TTCTTCCTTC CTGTCTTCTG TCTCACGGTC CTCTACAGTC TCATCGGCAG GAAGCTGTGG 72 0 

CGGAGGAGGC GCGGCGATGC TGTCGTGGGT GCCTCGCTCA GGGACCAGAA CCACAAG CAA 78 0 

ACCGTGAAAA TGCTGGCTGT AGTGGTGTTT GCCTTCATCC TCTGCTGGCT CCCCTTCCAC 84 0 

15 GTAGGGCGAT ATTTATTTTC CAAATCCTTT GAGCCTGGCT CCTTGGAGAT TGCTCAGATC 900 

AGCCAGTACT GCAACCTCGT GTCCTTTGTC CTCTTCTACC TCAGTGCTGC CATCAACCCC 96 0 

ATTCTGTACA ACATCATGTC CAAGAAGTAC CGGGTGGCAG TGTTCAGACT TCTGGGATTC 102 0 

GAACCCTTCT CCCAGAGAAA GCTCTCCACT CTGAAAGATG AAAGTTCTCG GGCCTGGACA 108 0 

GAATCTAGTA TTAATACATG A 1101 

20 (89) INFORMATION FOR SEQ ID NO: 88: 

<i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 366 amino acids 

(B) TYPE: amino acid 

(C) STRANDEDNESS : 

25 (D) TOPOLOGY: not relevant 

(ii) MOLECULE TYPE: protein 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO:88: 

Met Trp Asn Ala Thr Pro Ser Glu Glu Pro Gly Phe Asn Leu Thr Leu 
1 5 10 15 

30 Ala Asp Leu Asp Trp Asp Ala Ser Pro Gly Asn Asp Ser Leu Gly Asp 

20 25 30 



BNSDOCIO: <WO 0022129A1 J„> 



WO 00/22129 



PCT/US99/23938 



68 

Glu Leu Leu Gin Leu Phe Pro Ala Pro Leu Leu Ala Gly Val Thr Ala 
35 40 45 

Thr Cys Val Ala Leu Phe Val Val Gly lie Ala Gly Asn Leu Leu Thr 
50 55 60 

5 Met Leu Val Val Ser Arg Phe Arg Glu Leu Arg Thr Thr Thr Asn Leu 

65 70 75 80 

Tyr Leu Ser Ser Met Ala Phe Ser Asp Leu Leu lie Phe Leu Cys Met 
85 90 95 

Pro Leu Asp Leu Val Arg Leu Trp Gin Tyr Arg Pro Trp Asn Phe Gly 
10 100 105 no 

Asp Leu Leu Cys Lys Leu Phe Gin Phe Val Ser Glu Ser Cys Thr Tyr 
115 120 125 

Ala Thr Val Leu Thr lie Thr Ala Leu Ser Val Glu Arg Tyr Phe Ala 
130 135 140 

15 lie Cys Phe Pro Leu Arg Ala Lys Val Val Val Thr Lys Gly Arg Val 

145 150 155 160 

Lys Leu Val lie Phe Val lie Trp Ala Val Ala Phe Cys Ser Ala Gly 
165 170 175 

Pro lie Phe Val Leu Val Gly Val Glu His Glu Asn Gly Thr Asp Pro 
20 180 185 190 

Trp Asp Thr Asn Glu Cys Arg Pro Thr Glu Phe Ala Val Arg Ser Gly 
195 200 205 

Leu Leu Thr Val Met Val Trp Val Ser Ser lie Phe Phe Phe Leu Pro 
210 215 220 

25 Val Phe Cys Leu Thr Val Leu Tyr Ser Leu lie Gly Arg Lys Leu Trp 

225 230 235 240 

Arg Arg Arg Arg Gly Asp Ala Val Val Gly Ala Ser Leu Arg Asp Gin 
245 250 255 

Asn His Lys Gin Thr Val Lys Met Leu Ala Val Val Val Phe Ala Phe 
30 260 265 270 

He Leu Cys Trp Leu Pro Phe His Val Gly Arg Tyr Leu Phe Ser Lys 
275 280 285 

Ser Phe Glu Pro Gly Ser Leu Glu He Ala Gin He Ser Gin Tyr Cys 
290 295 300 

35 Asn Leu Val Ser Phe Val Leu Phe Tyr Leu Ser Ala Ala lie Asn Pro 

305 310 315 320 

lie Leu Tyr Asn He Met Ser Lys Lys Tyr Arg Val Ala Val Phe Arg 
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325 330 335 

Leu Leu Gly Phe Glu Pro Phe Ser Gin Arg Lys Leu Ser Thr Leu Lys 
340 345 350 

Asp Glu Ser Ser Arg Ala Trp Thr Glu Ser Ser lie Asn Thr 
5 355 360 365 

(90) INFORMATION FOR SEQ ID NO: 89: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 3 3 base pairs 

(B) TYPE: nucleic acid 
10 (C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 89: 
GCAAGCTTGT GCCCTCACCA AGCCATGCGA GCC 3 3 

15 (91) INFORMATION FOR SEQ ID NO: 90: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 3 0 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : single 
20 (D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 90: 
CGGAATTCAG CAATGAGTTC CGACAGAAGC 3 0 

(92) INFORMATION FOR SEQ ID NO: 91: 

25 (i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 1842 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 

30 (ii) MOLECULE TYPE; DNA (genomic) 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 91: 

ATGCGAGCCC CGGGCGCGCT TCTCGCCCGC ATGTCGCGGC TACTGCTTCT GCTACTGCTC 60 

AAGGTGTCTG CCTCTTCTGC CCTCGGGGTC GCCCCTGCGT CCAGAAACGA AACTTGTCTG 120 

GGGGAGAGCT GTGCACCTAC AGTGATCCAG CGCCGCGGCA GGGACGCCTG GGGACCGGGA 18 0 

35 AATTCTGCAA GAGACGTTCT GCGAGCCCGA GCACCCAGGG AGGAGCAGGG GGCAGCGTTT 24 0 
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CTTGCGGGAC 


CCTCCTGGGA 


CCTGCCGGCG 


GCCCCGGGCC 


1 GTGACCCGGC 


TGCAGGCAGA 


300 




GGGGCGGAGG 


CGTCGGCAGC 


CGGACCCCCG 


GGACCTCCAA 


CCAGGCCACC 


TGGCCCCTGG 


360 




AGGTGGAAAG 


GTGCTCGGGG 


TCAGGAGCCT 


TCTGAAACTT 


TGGGGAGAGG 


GAACCCCACG 


420 




GCCCTCCAGC 


TCTTCCTTCA 


GATCTCAGAG 


GAGGAAGAGA 


AGGGTCCCAG 


AGGCGCTGGC 


480 


c 
D 


ATTTCCGGGC 


GTAGCCAGGA 


GCAGAGTGTG 


AAGACAGTCC 


CCGGAGCCAG 


CGATCTTTTT 


540 




TACTGGCCAA 


GGAGAGCCGG 


GAAACTC CAG 


GGTTCCCACC 


ACAAGCCCCT 


GTCCAAGACG 


600 




GCCAATGGAC 


TGGCGGGG C A 


CGAAGGGTGG 


ACAATTGCAC 


TCCCGGGCCG 


GGCGCTGGCC 


660 




CAGAATGGAT 


CCTTGGGTGA 


AGGAATCCAT 


GAGCCTGGGG 


GTCCCCGCCG 


GGGAAACAGC 


720 




ACGAACCGGC 


GTGTGAGACT 


GAAGAACCCC 


TTCTACCCGC 


TGACCCAGGA 


GTCCTATGGA 


780 


10 


GCCTACGCGG 


TCATGTGTCT 


GTCCGTGGTG 


ATCTTCGGGA 


CCGGCATCAT 


TGGCAACCTG 


840 




GCGGTGATGA 


GCATCGTGTG 


CCACAACTAC 


TACATGCGGA 


GCATCTCCAA 


CTCCCTCTTG 


900 




GCCAACCTGG 


CCTTCTGGGA 


CTTTCTCATC 


ATCTTCTTCT 


GCCTTCCGCT 


GGTCATCTTC 


960 




CACGAGCTGA 


CCAAGAAGTG 


G CTGCTGG AG 


GACTTCTCCT 


GCAAGATCGT 


G CCCTATAT A 


1020 




GAGGTCGCTT 


CTCTGGGAGT 


CACCACTTTC 


ACCTTATGTG 


CTCTGTGCAT 


AGACCGCTTC 


1080 


Id 


CGTGCTGCCA 


CCAACGTACA 


GATGTACTAC 


GAAATGATCG 


AAAACTGTTC 


CTCAACAACT 


1140 




GCCAAACTTG 


CTGTTATATG 


GGTGGGAGCT 


CTATTGTTAG 


CACTTCCAGA 


AGTTGTTCTC 


1200 




CGCCAGCTGA 


GCAAGGAGGA 


TTTGGGGTTT 


AGTGGCCGAG 


CTCCGGCAGA 


AAGGTGCATT 


1260 




ATTAAGATCT 


CTCCTGATTT 


ACCAGACACC 


ATCTATGTTC 


TAGCCCTCAC 


CTACGACAGT 


1320 




GCGAGACTGT 


GGTGGTATTT 


TGGCTGTTAC 


TTTTGTTTGC 


CCACGCTTTT 


CACCATCACC 


1380 


20 


TGCTCTCTAG 


TGACTGCGAG 


GAAAATCCGC 


AAAGCAGAGA 


AAGCCTGTAC 


CCGAGGGAAT 


1440 




AAACGGCAGA 


TTCAACTAGA 


GAGTCAGATG 


AACTGTACAG 


TAGTGGCACT 


GACCATTTTA 


1500 




TATGGATTTT 


GCATTATTCC 


TGAAAATATC 


TGCAACATTG 


TTACTGCCTA 


CATGGCTACA 


1560 




GGGGTTTCAC 


AGCAGACAAT 


GGACCTCCTT 


AATATCATCA 


GCCAGTTCCT 


TTTGTTCTTT 


1620 




AAGTC CTGTG 


TCACCCCAGT 


CCTCCTTTTC 


TGTCTCTGCA 


AACCCTTCAG 


TCGGGCCTTC 


1680 


25 


ATGGAGTGCT 


GCTGCTGTTG 


CTGTGAGGAA 


TGCATTCAGA 


AGTCTTCAAC 


GGTGACCAGT 


1740 




GATGACAATG 


ACAACGAGTA 


CACCACGGAA 


CTCGAACTCT 


CGCCTTTCAG 


TACCATACGC 


1800 




CGTGAAATGT 


CCACTTTTGC 


TTCTGTCGGA 


ACTCATTGCT 


GA 




1842 



(93) INFORMATION FOR SEQ ID NO: 92: 
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(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 613 amino acids 

(B) TYPE: amino acid 

(C) STRANDEDNESS : 

(D) TOPOLOGY: not relevant 

(ii) MOLECULE TYPE: protein 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 92: 

Met Arg Ala Pro Gly Ala Leu Leu Ala Arg Met Ser Arg Leu Leu Leu 
15 10 15 

Leu Leu Leu Leu Lys Val Ser Ala Ser Ser Ala Leu Gly Val Ala Pro 
20 25 30 

Ala Ser Arg Asn Glu Thr Cys Leu Gly Glu Ser Cys Ala Pro Thr Val 
35 40 45 

lie Gin Arg Arg Gly Arg Asp Ala Trp Gly Pro Gly Asn Ser Ala Arg 
50 55 60 

Asp Val Leu Arg Ala Arg Ala Pro Arg Glu Glu Gin Gly Ala Ala Phe 
65 70 75 80 

Leu Ala Gly Pro Ser Trp Asp Leu Pro Ala Ala Pro Gly Arg Asp Pro 
85 90 95 

Ala Ala Gly Arg Gly Ala Glu Ala Ser Ala Ala Gly Pro Pro Gly Pro 
100 105 110 

Pro Thr Arg Pro Pro Gly Pro Trp Arg Trp Lys Gly Ala Arg Gly Gin 
115 120 125 

Glu Pro Ser Glu Thr Leu Gly Arg Gly Asn Pro Thr Ala Leu Gin Leu 
130 135 140 

Phe Leu Gin lie Ser Glu Glu Glu Glu Lys Gly Pro Arg Gly Ala Gly 
145 150 155 160 

lie Ser Gly Arg Ser Gin Glu Gin Ser Val Lys Thr Val Pro Gly Ala 
165 170 175 

Ser Asp Leu Phe Tyr Trp Pro Arg Arg Ala Gly Lys Leu Gin Gly Ser 
180 185 190 

His His Lys Pro Leu Ser Lys Thr Ala Asn Gly Leu Ala Gly His Glu 
195 200 205 

Gly Trp Thr lie Ala Leu Pro Gly Arg Ala Leu Ala Gin Asn Gly Ser 
210 215 220 



Leu Gly Glu Gly lie His Glu Pro Gly Gly Pro Arg Arg Gly Asn Ser 
225 230 235 240 
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Thr Asn Arg Arg Val Arg Leu Lys Asn Pro Phe Tyr Pro Leu Thr Gin 
245 250 255 

Glu Ser Tyr Gly Ala Tyr Ala Val Met Cys Leu Ser Val Val He Phe 
260 265 270 

Gly Thr Gly He He Gly Asn Leu Ala Val Met Ser He Val Cys His 
275 280 285 



Asn Tyr Tyr Met Arg 
290 

Phe Trp Asp Phe Leu 
305 

His Glu Leu Thr Lys 
325 

Val Pro Tyr He Glu 
340 

Cys Ala Leu Cys He 
355 

Tyr Tyr Glu Met He 
370 

Val He Trp Val Gly 
385 

Arg Gin Leu Ser Lys 
405 

Glu Arg Cys He He 
420 

Val Leu Ala Leu Thr 
435 

Cys Tyr Phe Cys Leu 
450 



Ser He Ser Asn Ser Leu 
295 

lie He Phe Phe Cys Leu 
310 315 

Lys Trp Leu Leu Glu Asp 
330 

Val Ala Ser Leu Gly Val 
345 

Asp Arg Phe Arg Ala Ala 
360 

Glu Asn Cys Ser Ser Thr 
375 

Ala Leu Leu Leu Ala Leu 
390 395 

Glu Asp Leu Gly Phe Ser 
410 

Lys He Ser Pro Asp Leu 
425 

Tyr Asp Ser Ala Arg Leu 
440 

Pro Thr Leu Phe Thr He 
455 



Leu Ala Asn Leu Ala 
300 

Pro Leu Val He Phe 
320 

Phe Ser Cys Lys He 
335 

Thr Thr Phe Thr Leu 
350 

Thr Asn Val Gin Met 
365 

Thr Ala Lys Leu Ala 
380 

Pro Glu Val Val Leu 
400 

Gly Arg Ala Pro Ala 
415 

Pro Asp Thr He Tyr 
430 

Trp Trp Tyr Phe Gly 
445 

Thr Cys Ser Leu Val 
460 



Thr Ala Arg Lys 
465 

Lys Arg Gin He 



Leu Thr He Leu 
500 

He Val Thr Ala 
515 



He Arg Lys Ala 
470 

Gin Leu Glu Ser 
485 

Tyr Gly Phe Cys 



Tyr Met Ala Thr 
520 



Glu Lys Ala Cys 
475 

Gin Met Asn Cys 
490 

He He Pro Glu 
505 

Gly Val Ser Gin 



Thr Arg Gly Asn 
480 

Thr Val Val Ala 
495 

Asn He Cys Asn 
510 

Gin Thr Met Asp 
525 



Leu Leu Asn He He Ser Gin Phe Leu Leu Phe Phe Lys Ser Cys Val 
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530 535 540 

Thr Pro Val Leu Leu Phe Cys Leu Cys Lys Pro Phe Ser Arg Ala Phe 
545 550 555 560 

Met Glu Cys Cys Cys Cys Cys Cys Glu Glu Cys lie Gin Lys Ser Ser 
5 565 570 575 

Thr Val Thr Ser Asp Asp Asn Asp Asn Glu Tyr Thr Thr Glu Leu Glu 
580 585 590 

Leu Ser Pro Phe Ser Thr lie Arg Arg Glu Met Ser Thr Phe Ala Ser 
595 600 605 

10 Val Gly Thr His Cys 

610 

(94) INFORMATION FOR SEQ ID NO: 93: 

(i) SEQUENCE CHARACTERISTICS: 
(A) LENGTH: 34 base pairs 

15 (B) TYPE: nucleic acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 93: 
20 CAGAATTCAG AGAAAAAAAG TGAATATGGT TTTT 34 

(95) INFORMATION FOR SEQ ID NO: 94: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 32 base pairs 

(B) TYPE: nucleic acid 
25 (C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO : 94 : 
TTGGATCCCT GGTG C ATAAC AATTGAAAGA AT 32 
30 (96) INFORMATION FOR SEQ ID NO: 95: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 124 8 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 
35 (D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 
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(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 95: 

ATGGTTTTTG CTCACAGAAT GGATAACAGC AAG CCACATT TGATTATTCC TACACTTCTG 6 0 

GTGCCCCTCC AAAACCGCAG CTGCACTGAA ACAGCCACAC CTCTGCCAAG CCAATACCTG 12 0 

ATGGAATTAA GTGAGGAGCA CAGTTGGATG AG C AAC C AAA CAGACCTTCA CTATGTGCTG 18 0 

AAACCCGGGG AAGTGGCCAC AGCCAGCATC TTCTTTGGGA TTCTGTGGTT GTTTTCTATC 24 0 

TTCGGCAATT CCCTGGTTTG TTTGGTCATC CATAGGAGTA GGAGGACTCA GTCTACCACC 3 00 

AACTACTTTG TGGTCTCCAT GGCATGTGCT GACCTTCTCA TCAG CGTTGC CAGCACGCCT 36 0 

TTCGTCCTGC TCCAGTTCAC CACTGGAAGG TGGACGCTGG GTAGTGCAAC GTGCAAGGTT 42 0 

GTGCGATATT TTCAATATCT CACTCCAGGT GTCCAGATCT ACGTTCTCCT CTCCATCTGC 4 80 

ATAGACCGGT TCTACACCAT CGTCTATCCT CTGAGCTTCA AGGTGTCCAG AGAAAAAGCC 54 0 

AAGAAAATGA TTGCGGCATC GTGGATCTTT GATGCAGGCT TTGTGACCCC TGTGCTCTTT 6 00 

TTCTATGGCT CCAACTGGGA CAGTCATTGT AACTATTTCC TCCCCTCCTC TTGGGAAGGC 6 60 

ACTGCCTACA CTGTCATCCA CTTCTTGGTG GGCTTTGTGA TTCCATCTGT CCTCATAATT 72 0 

TTATTTTACC AAAAGGTCAT AAAATATATT TGGAGAATAG G C AC AG ATGG CCGAACGGTG 78 0 

AGGAGGACAA TGAACATTGT CCCTCGGACA AAAGTGAAAA CTATCAAGAT GTTCCTCATT 84 0 

TTAAATCTGT TGTTTTTGCT CTCCTGGCTG CCTTTTCATG TAGCTCAGCT ATGGCACCCC 900 

CATGAACAAG AC TAT AAG AA AAGTT CCCTT GTTTTCACAG CTATCACATG GATATCCTTT 960 

AGTTCTTCAG CCTCTAAACC TACTCTGTAT TCAATTTATA ATGCCAATTT TCGGAGAGGG 1020 

ATGAAAGAGA CTTTTTGCAT GTCCTCTATG AAATGTTACC GAAGCAATGC CTATACTATC 108 0 

ACAACAAGTT CAAGGATGGC CAAAAAAAAC TACGTTGGCA TTTCAGAAAT CCCTTCCATG 114 0 

GCCAAAACTA TTACCAAAGA CTCGATCTAT GACTCATTTG ACAGAGAAGC CAAGGAAAAA 12 00 

AAGCTTGCTT GGCCCATTAA CTCAAATCCA CCAAATACTT TTGTCTAA 124 8 
(97) INFORMATION FOR SEQ ID NO: 96: 

(i) SEQUENCE CHARACTERISTICS : 

(A) LENGTH: 415 amino acids 

(B) TYPE: amino acid 

(C) STRANDEDNESS : 

(D) TOPOLOGY: not relevant 

(ii) MOLECULE TYPE: protein 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 96: 
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Met Val Phe Ala His Arg Met Asp Asn Ser Lys Pro His Leu lie lie 
15 10 15 

Pro Thr Leu Leu Val Pro Leu Gin Asn Arg Ser Cys Thr Glu Thr Ala 
20 25 30 

5 Thr Pro Leu Pro Ser Gin Tyr Leu Met Glu Leu Ser Glu Glu His Ser 

35 40 45 

Trp Met Ser Asn Gin Thr Asp Leu His Tyr Val Leu Lys Pro Gly Glu 
50 55 60 

Val Ala Thr Ala Ser lie Phe Phe Gly lie Leu Trp Leu Phe Ser lie 
10 65 70 75 80 

Phe Gly Asn Ser Leu Val Cys Leu Val lie His Arg Ser Arg Arg Thr 
85 90 95 

Gin Ser Thr Thr Asn Tyr Phe Val Val Ser Met Ala Cys Ala Asp Leu 
100 105 110 

15 Leu lie Ser Val Ala Ser Thr Pro Phe Val Leu Leu Gin Phe Thr Thr 

115 120 125 

Gly Arg Trp Thr Leu Gly Ser Ala Thr Cys Lys Val Val Arg Tyr Phe 
130 135 140 

Gin Tyr Leu Thr Pro Gly Val Gin lie Tyr Val Leu Leu Ser lie Cys 
20 145 150 155 160 

lie Asp Arg Phe Tyr Thr lie Val Tyr Pro Leu Ser Phe Lys Val Ser 
165 170 175 

Arg Glu Lys Ala Lys Lys Met He Ala Ala Ser Trp He Phe Asp Ala 
180 185 190 

25 Gly Phe Val Thr Pro Val Leu Phe Phe Tyr Gly Ser Asn Trp Asp Ser 

195 200 205 

His Cys Asn Tyr Phe Leu Pro Ser Ser Trp Glu Gly Thr Ala Tyr Thr 
210 215 220 

Val He His Phe Leu Val Gly Phe Val He Pro Ser Val Leu He lie 
30 225 230 235 240 

Leu Phe Tyr Gin Lys Val He Lys Tyr He Trp Arg He Gly Thr Asp 
245 250 255 

Gly Arg Thr Val Arg Arg Thr Met Asn He Val Pro Arg Thr Lys Val 
260 265 270 

35 Lys Thr He Lys Met Phe Leu He Leu Asn Leu Leu Phe Leu Leu Ser 

275 280 285 
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Trp Leu Pro Phe His Val Ala Gin Leu Trp His Pro His Glu Gin Asp 
290 295 300 

Tyr Lys Lys Ser Ser Leu Val Phe Thr Ala He Thr Trp He Ser Phe 
305 310 -15 320 

Ser Ser Ser Ala Ser Lys Pro Thr Leu Tyr Ser He Tyr Asn Ala Asn 
325 330 335 

Phe Arg Arg Gly Met Lys Glu Thr Phe Cys Met Ser Ser Met Lys Cys 
340 345 35 0 

Tyr Arg Ser Asn Ala Tyr Thr He Thr Thr Ser Ser Arg Met Ala Lys 
355 360 365 

Lys Asn Tyr Val Gly He Ser Glu He Pro Ser Met Ala Lys Thr He 
370 375 380 

Thr Lys Asp Ser He Tyr Asp Ser Phe Asp Arg Glu Ala Lys Glu Lys 
385 390 395 400 

Lys Leu Ala Trp Pro He Asn Ser Asn Pro Pro Asn Thr Phe Val 
405 410 415 

(98) INFORMATION FOR SEQ ID NO: 97: 

(i) SEQUENCE CHARACTERISTICS: 
(A) LENGTH: 3 0 base pairs 

20 (b) TYPE: nucleic acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 97: 
25 GGAAAGCTTA ACGATCCCCA GGAGCAACAT 

(99) INFORMATION FOR SEQ ID NO: 98: 



15 



30 



30 



(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 31 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 



(ii) MOLECULE TYPE: DNA (genomic) 
(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 98: 
CTGGGATCCT ACGAGAGCAT TTTTCACACA G 31 
35 (100) INFORMATION FOR SEQ ID NO: 99: 
(i) SEQUENCE CHARACTERISTICS: 
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(A) LENGTH: 1842 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 

5 (ii) MOLECULE TYPE: DNA (genomic) 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 99: 

ATGGGGCCCA CCCTAGCGGT TCCCACCCCC TATGGCTGTA TTGGCTGTAA GCTACCCCAG 6 0 

CCAGAATACC CACCGGCTCT AATCATCTTT ATGTTCTGCG CGATGGTTAT CACCATCGTT 12 0 

GTAGACCTAA TCGGCAACTC CATGGTCATT TTGGCTGTGA CGAAGAACAA GAAGCTCCGG 18 0 

10 AATTCTGGCA ACATCTTCGT GGTCAGTCTC TCTGTGGCCG ATATGCTGGT GGCCATCTAC 24 0 

CCATACCCTT TGATGCTGCA TGCCATGTCC ATTGGGGGCT GGGATCTGAG CCAGTTACAG 3 00 

TGCCAGATGG TCGGGTTCAT CACAGGGCTG AGTGTGGTCG GCTCCATCTT CAACATCGTG 3 60 

GCAATCGCTA TCAACCGTTA CTGCTACATC TGCCACAGCC TCCAGTACGA ACGGATCTTC 42 0 

AGTGTGCGCA ATACCTGCAT CTACCTGGTC ATCACCTGGA TCATGACCGT CCTGGCTGTC 48 0 

15 CTGCCCAACA TGTACATTGG CACCATCGAG TACGATCCTC GCACCTACAC CTGCATCTTC 54 0 

AACTATCTGA ACAACCCTGT CTTCACTGTT ACCATCGTCT GCATCCACTT CGTCCTCCCT 6 00 

CTCCTCATCG TGGGTTTCTG CTACGTGAGG ATCTGGACCA AAGTGCTGGC GGC CCGTGAC 66 0 

CCTGCAGGGC AGAATCCTGA CAACCAACTT GCTGAGGTTC GCAATTTTCT AACCATGTTT 72 0 

GTGATCTTCC TCCTCTTTGC AGTGTG CTGG TGCCCTATCA ACGTGCTCAC TGTCTTGGTG 78 0 

20 GCTGTCAGTC CGAAGGAGAT GGCAGGCAAG ATCCCCAACT GGCTTTATCT TGCAGCCTAC 84 0 

TTCATAGCCT ACTTCAACAG CTGCCTCAAC GCTGTGATCT ACGGGCTCCT CAATGAGAAT 900 

TTCCGAAGAG AATACTGGAC CATCTTCCAT GCTATGCGGC ACCCTATCAT ATTCTTCCCT 96 0 

GGCCTCATCA GTGATATTCG TGAGATGCAG GAGGCCCGTA CCCTGGCCCG CGCCCGTGCC 102 0 

CATGCTCGCG ACCAAGCTCG TGAACAAGAC CGTGCCCATG CCTGTCCTGC TGTGGAGGAA 1080 

25 ACCCCGATGA ATGTCCGGAA TGTTCCATTA CCTGGTGATG CTGCAG CTGG CCACCCCGAC 114 0 

CGTGCCTCTG GCCACCCTAA GCCCCATTCC AGATCCTCCT CTGCCTATCG CAAATCTGCC 120 0 

TCTACCCACC ACAAGTCTGT CTTTAGCCAC TCCAAGGCTG CCTCTGGTCA CCTCAAGCCT 126 0 

GTCTCTGGCC ACTCCAAGCC TGCCTCTGGT CACCCCAAGT CTGCCACTGT CTACCCTAAG 132 0 

CCTGCCTCTG TCCATTTCAA GGGTGACTCT GTCCATTTCA AGGGTGACTC TGTCCATTTC 13 80 
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AAG C CTG ACT CTGTTCATTT CAAGCCTGCT TCCAGCAACC CCAAGCCCAT CACTGGCCAC 144 0 

CATGTCTCTG CTGGCAGCCA CTCCAAGTCT GCCTTCAGTG CTGCCACCAG CCACCCTAAA 15 00 

CCCATCAAGC C AG C TAC C AG CCATGCTGAG CCCACCACTG CTGACTATCC CAAGCCTGCC 15 CO 

ACTACCAGCC ACCCTAAGCC CGCTGCTGCT GACAACCCTG AGCTCTCTGC CTCCCATTGC 162 0 

5 CCCGAGATCC CTGCCATTGC CCACCCTGTG TCTGACGACA GTGACCTCCC TGAGTCGGCC 16 8 0 

TCTAGCCCTG CCGCTGGGCC CACCAAGCCT GCTGCCAGCC AGCTGGAGTC TGACACCATC 174 0 

GCTGACCTTC CTGACCCTAC TGTAGTCACT ACCAGTACCA ATGATTACCA TGATGTCGTG 18 00 

GTTGTTGATG TTGAAGATGA TCCTGATGAA ATGGCTGTGT GA 18 42 
(101) INFORMATION FOR SEQ ID NO: 10 0: 

10 (i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 613 amino acids 

(B) TYPE : amino acid 

(C) STRANDEDNES S : 

(D) TOPOLOGY: not relevant 

15 (ii) MOLECULE TYPE: protein 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 100: 

Met Gly Pro Thr Leu Ala Val Pro Thr Pro Tyr Gly Cys lie Gly Cys 
15 10 15 

Lys Leu Pro Gin Pro Glu Tyr Pro Pro Ala Leu lie lie Phe Met Phe 
20 20 25 30 

Cys Ala Met Val lie Thr lie Val Val Asp Leu lie Gly Asn Ser Met 
35 40 45 

Val lie Leu Ala Val Thr Lys Asn Lys Lys Leu Arg Asn Ser Gly Asn 
50 55 60 

25 He Phe Val Val Ser Leu Ser Val Ala Asp Met Leu Val Ala He Tyr 

65 70 75 80 

Pro Tyr Pro Leu Met Leu His Ala Met Ser He Gly Gly Trp Asp Leu 
85 90 95 

Ser Gin Leu Gin Cys Gin Met Val Gly Phe He Thr Gly Leu Ser Val 
30 100 105 110 

Val Gly Ser He Phe Asn He Val Ala He Ala He Asn Arg Tyr Cys 
115 120 125 

Tyr He Cys His Ser Leu Gin Tyr Glu Arg He Phe Ser Val Arg Asn 
130 135 140 
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Thr 
145 

Leu 

5 Thr 
Val 
Val 

10 

Asn 
225 

Val 

15 Thr 



Cys lie Tyr 



Pro Asn Met 



Cys lie Phe 
180 

Cys lie His 
195 

Arg lie Trp 
210 

Pro Asp Asn 



He Phe Leu 



Val Leu Val 
260 



Leu Val He 
150 

Tyr He Gly 
165 

Asn Tyr Leu 



Phe Val Leu 



Thr Lys Val 
215 

Gin Leu Ala 
230 

Leu Phe Ala 
245 

Ala Val Ser 



Thr Trp He 



Thr He Glu 
170 

Asn Asn Pro 
185 

Pro Leu Leu 
200 

Leu Ala Ala 



Glu Val Arg 



Val Cys Trp 
250 

Pro Lys Glu 
265 



Met Thr Val 
155 

Tyr Asp Pro 

Val Phe Thr 

lie Val Gly 
205 

Arg Asp Pro 
220 

Asn Phe Leu 
235 

Cys Pro He 
Met Ala Gly 



Leu Ala Val 
160 

Arg Thr Tyr 
175 

Val Thr He 
190 

Phe Cys Tyr 



Ala Gly Gin 



Thr Met Phe 
240 

Asn Val Leu 
255 

Lys He Pro 
270 



Asn Trp Leu Tyr Leu Ala Ala Tyr Phe He Ala Tyr Phe Asn Ser Cys 

275 280 285 

Leu Asn Ala Val He Tyr Gly Leu Leu Asn Glu Asn Phe Arg Arg Glu 
20 290 295 300 

Tyr Trp Thr He Phe His Ala Met Arg His Pro He He Phe Phe Pro 
305 310 315 320 



Gly 

25 Arg 
His 
Pro 

30 

His 
385 

Ser 

35 His 



Leu He Ser 



Ala Arg Ala 
340 

Ala Cys Pro 
355 

Leu Pro Gly 
370 

Pro Lys Pro 



Thr His His 



Leu Lys Pro 
420 



Asp He Arg 
325 

His Ala Arg 



Ala Val Glu 



Asp Ala Ala 
375 

His Ser Arg 
390 

Lys Ser Val 
405 

Val Ser Gly 



Glu Met Gin 

330 

Asp Gin Ala 
345 

Glu Thr Pro 
360 

Ala Gly His 



Ser Ser Ser 



Phe Ser His 
410 

His Ser Lys 
425 



Glu Ala Arg 



Arg Glu Gin 



Met Asn Val 
365 

Pro Asp Arg 
380 

Ala Tyr Arg 
395 

Ser Lys Ala 



Pro Ala Ser 



Thr Leu Ala 
335 

Asp Arg Ala 
350 

Arg Asn Val 



Ala Ser Gly 



Lys Ser Ala 
400 

Ala Ser Gly 
415 

Gly His Pro 
430 



Lys Ser Ala Thr Val Tyr Pro Lys Pro Ala Ser Val His Phe Lys Gly 
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435 440 445 

Asp Ser Val His Phe Lys Gly Asp Ser Val His Phe Lys Pro Asp Ser 
450 455 460 

Val His Phe Lys Pro Ala Ser Ser Asn Pro Lys Pro lie Thr Gly His 
465 470 475 4 80 

His Val Ser Ala Gly Ser His Ser Lys Ser Ala Phe Ser Ala Ala Thr 
485 490 495 

Ser His Pro Lys Pro He Lys Pro Ala Thr Ser His Ala Glu Pro Thr 
500 505 sio 

Thr Ala Asp Tyr Pro Lys Pro Ala Thr Thr Ser His Pro Lys Pro Ala 
515 520 525 

Ala Ala Asp Asn Pro Glu Leu Ser Ala Ser His Cys Pro Glu He Pro 
530 535 540 



Ala He Ala His Pro Val Ser Asp Asp Ser Asp Leu Pro Glu Ser Ala 

560 



15 545 550 555 



Ser Ser Pro Ala Ala Gly Pro Thr Lys Pro Ala Ala Ser Gin Leu Glu 
565 570 575 

Ser Asp Thr He Ala Asp Leu Pro Asp Pro Thr Val Val Thr Thr Ser 
580 585 590 

Thr Asn Asp Tyr His Asp Val Val Val Val Asp Val Glu Asp Asp Pro 
595 600 605 

Asp Glu Met Ala Val 
610 

(102) INFORMATION FOR SEQ ID NO: 101: 

25 (i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 32 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 

30 (ii) MOLECULE TYPE: DNA (genomic) 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO:101: 
TCCAAGCTTC GCCATGGGAC ATAACGGGAG CT 32 

(103) INFORMATION FOR SEQ ID NO: 102: 



35 



(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 30 base pairs 

(B) TYPE: nucleic acid 

( C ) STRAND EDNES S : s ingl e 
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(D) TOPOLOGY: linear 
(ii) MOLECULE TYPE: DNA (genomic) 
(xi) SEQUENCE DESCRIPTION: SEQ ID NO:102: 

CGTGAATTCC AAGAATTTAC AATCCTTGCT 30 
5 (104) INFORMATION FOR SEQ ID NO:103: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 154 8 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : single 
10 (D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 103: 

ATGGGACATA ACGGGAGCTG GATCTCTCCA AATGCCAGCG AGCCGCACAA CGCGTCCGGC 6 0 

GCCGAGGCTG CGGGTGTGAA CCGCAGCGCG CTCGGGGAGT TCGGCGAGGC GCAGCTGTAC 12 0 

15 CGCCAGTTCA CCACCACCGT GCAGGTCGTC ATCTTCATAG GCTCGCTGCT CGGAAACTTC 180 

ATGGTGTTAT GGTCAACTTG CCGCACAACC GTGTTCAAAT CTGTCACCAA CAGGTTCATT 24 0 

AAAAACCTGG CCTGCTCGGG GATTTGTGCC AGCCTGGTCT GTGTGCCCTT CGACATCATC 3 00 

CTCAGCACCA GTCCTCACTG TTGCTGGTGG ATCTACACCA TGCTCTTCTG CAAGGTCGTC 360 

AAATTTTTG C ACAAAGTATT CTGCTCTGTG ACCATCCTCA GCTTCCCTGC TATTG CTTTG 42 0 

20 GACAGGTACT. ACTCAGTCCT CTATCCACTG GAGAGGAAAA TATCTGATGC CAAGTCCCGT 480 

GAACTGGTGA TGTACATCTG GGCCCATGCA GTGGTGGCCA GTGTCCCTGT GTTTGCAGTA 540 

ACCAATGTGG CTGACATCTA TGCCACGTCC ACCTGCACGG AAGTCTGGAG CAACTCCTTG 600 

GGCCACCTGG TGTACGTTCT GGTGTATAAC ATCACCACGG TCATTGTGCC TGTGGTGGTG 660 

GTGTTCCTCT TCTTGAT AC T GATCCGACGG GCCCTGAGTG CCAGCCAGAA GAAGAAGGTC 72 0 

25 ATCATAGCAG CGCTCCGGAC CCCACAGAAC ACCATCTCTA TTCCCTATGC CTCCCAGCGG 78 0 

GAGGCCGAGC TGCACGCCAC CCTGCTCTCC ATGGTGATGG TCTTCATCTT GTGTAGCGTG 84 0 

CCCTATGCCA CCCTGGTCGT CTACCAGACT GTGCTCAATG TCCCTGACAC TTCCGTCTTC 900 

TTGCTGCTCA CTG CTGTTTG GCTGCCCAAA GTCTCCCTGC TGGCAAACCC TGTTCTCTTT 960 

CTTACTGTGA ACAAATCTGT CCGCAAGTGC TTGATAGGGA CCCTGGTGCA ACTACACCAC 102 0 

30 CGGTACAGTC GCCGTAATGT GGTCAGTACA GGGAGTGGCA TGGCTGAGGC CAGCCTGGAA 108 0 
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CCCAGCATAC GCTCGGGTAG CCAGCTCCTG GAGATGTTCC ACATTGGGCA G C AG C AG AT C 114 0 

TTTAAGCCCA CAGAGGATGA GGAAGAGAGT GAGGCCAAGT ACATTGGCTC AGCTGACTTC 12 00 

CAGGCCAAGG AGATATTTAG CACCTGCCTC GAGGGAGAGC AGGGGCCACA GTTTGCGCCC 126 0 

TCTGCCCCAC CCCTGAGCAC AGTGGACTCT GTATCCCAGG TGG CACCGGC AGCCCCTGTG 132 0 

5 GAACCTGAAA CATTCCCTGA TAAGTATTCC CTGCAGTTTG GCTTTGGGCC TTTTGAGTTG 13 8 0 

CCTCCTCAGT GGCTCTCAGA GACCCGAAAC AGCAAGAAGC GGCTGCTTCC CCCCTTGGGC 144 0 

AACACCCCAG AAGAGCTGAT CCAGACAAAG GTGCCCAAGG TAGGCAGGGT GGAGCGGAAG 1500 

ATGAGCAGAA ACAATAAAGT GAG C ATTTTT CCAAAGGTGG ATTCCTAG 154 8 
(105) INFORMATION FOR SEQ ID NO: 104: 

10 (i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 515 amino acids 

(B) TYPE: amino acid 

(C) STRANDEDNESS : 

(D) TOPOLOGY: not relevant 

15 (ii) MOLECULE TYPE: protein 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 104: 

Met Gly His Asn Gly Ser Trp lie Ser Pro Asn Ala Ser Glu Pro His 
15 10 15 



20 



25 



Asn Ala Ser Gly Ala Glu Ala Ala Gly Val Asn Arg Ser Ala Leu Gly 
20 25 30 

Glu Phe Gly Glu Ala Gin Leu Tyr Arg Gin Phe Thr Thr Thr Val Gin 
35 40 45 

Val Val lie Phe lie Gly Ser Leu Leu Gly Asn Phe Met Val Leu Trp 
50 55 60 

Ser Thr Cys Arg Thr Thr Val Phe Lys Ser Val Thr Asn Arg Phe He 
65 70 75 80 

Lys Asn Leu Ala Cys Ser Gly He Cys Ala Ser Leu Val Cys Val Pro 
85 90 95 

Phe Asp He He Leu Ser Thr Ser Pro His Cys Cys Trp Trp He Tyr 
30 100 105 no 

Thr Met Leu Phe Cys Lys Val Val Lys Phe Leu His Lys Val Phe Cys 
115 120 125 

Ser Val Thr He Leu Ser Phe Pro Ala He Ala Leu Asp Arg Tyr Tyr 
130 135 140 
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Ser Val Leu Tyr Pro Leu Glu Arg Lys lie Ser Asp Ala Lys Ser Arg 
145 150 155 160 

Glu Leu Val Met Tyr lie Trp Ala His Ala Val Val Ala Ser Val Pro 
165 170 175 

Val Phe Ala Val Thr Asn Val Ala Asp lie Tyr Ala Thr Ser Thr Cys 
180 185 190 

Thr Glu Val Trp Ser Asn Ser Leu Gly His Leu Val Tyr Val Leu Val 
195 200 205 

Tyr Asn lie Thr Thr Val lie Val Pro Val Val Val Val Phe Leu Phe 
210 215 220 

Leu lie Leu lie Arg Arg Ala Leu Ser Ala Ser Gin Lys Lys Lys Val 
225 230 235 240 

lie lie Ala Ala Leu Arg Thr Pro Gin Asn Thr lie Ser lie Pro Tyr 
245 250 255 

Ala Ser Gin Arg Glu Ala Glu Leu His Ala Thr Leu Leu Ser Met Val 
260 265 270 

Met Val Phe lie Leu Cys Ser Val Pro Tyr Ala Thr Leu Val Val Tyr 
275 280 285 

Gin Thr Val Leu Asn Val Pro Asp Thr Ser Val Phe Leu Leu Leu Thr 
290 295 300 

Ala Val Trp Leu Pro Lys Val Ser Leu Leu Ala Asn Pro Val Leu Phe 
305 310 315 320 

Leu Thr Val Asn Lys Ser Val Arg Lys Cys Leu lie Gly Thr Leu Val 
325 330 335 

Gin Leu His His Arg Tyr Ser Arg Arg Asn Val Val Ser Thr Gly Ser 
340 345 350 

Gly Met Ala Glu Ala Ser Leu Glu Pro Ser lie Arg Ser Gly Ser Gin 
355 360 365 

Leu Leu Glu Met Phe His lie Gly Gin Gin Gin lie Phe Lys Pro Thr 
370 375 380 

Glu Asp Glu Glu Glu Ser Glu Ala Lys Tyr lie Gly Ser Ala Asp Phe 
385 390 395 400 

Gin Ala Lys Glu lie Phe Ser Thr Cys Leu Glu Gly Glu Gin Gly Pro 
405 410 415 

Gin Phe Ala Pro Ser Ala Pro Pro Leu Ser Thr Val Asp Ser Val Ser 
420 425 430 



Gin Val Ala Pro Ala Ala Pro Val Glu Pro Glu Thr Phe Pro Asp Lys 
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Tyr Ser Leu Gin Phe Gly Phe Gly Pro Phe Glu Leu Pro Pro Gin Trp 
450 455 460 

Leu Ser Glu Thr Arg Asn Ser Lys Lys Arg Leu Leu Pro Pro Leu Gly 
5 465 470 475 480 

Asn Thr Pro Glu Glu Leu lie Gin Thr Lys Val Pro Lys Val Gly Arg 
485 490 495 

Val Glu Arg Lys Met Ser Arg Asn Asn Lys Val Ser lie Phe Pro Lys 
500 505 510 

10 Val Asp Ser 

515 

(106) INFORMATION FOR SEQ ID NO: 105: 

(i) SEQUENCE CHARACTERISTICS: 
(A) LENGTH: 2 9 base pairs 

15 (B) TYPE: nucleic acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 10 5: 
20 GGAGAATTCA CTAGGCGAGG CGCTCCATC 2 9 

(107) INFORMATION FOR SEQ ID NO: 106: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 3 0 base pairs 

(B) TYPE: nucleic acid 
25 (C) STRANDEDNESS: single 

( D ) TOPOLOGY : 1 inear 

(ii) MOLECULE TYPE: DNA (genomic) 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 106: 
GGAGGATCCA GGAAACCTTA GGCCGAGTCC 3 0 

30 (108) INFORMATION FOR SEQ ID NO: 107: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 1164 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 
35 (D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 
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(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 107: 

ATGAATCGGC ACCATCTGCA GGATCACTTT CTGGAAATAG ACAAGAAGAA CTGCTGTGTG 6 0 

TTCCGAGATG ACTTCATTGC CAAGGTGTTG CCGCCGGTGT TGGGGCTGGA GTTTATCTTT 12 0 

GGGCTTCTGG GCAATGGCCT TGCCCTGTGG ATTTTCTGTT TCCACCTCAA GTCCTGGAAA 18 0 

5 TCCAGCCGGA TTTTCCTGTT CAACCTGGCA GTAG CTG ACT TTCTACTGAT CATCTGCCTG 24 0 

CCGTTCGTGA TGGACTACTA TGTGCGGCGT TCAGACTGGA ACTTTGGGGA CATCCCTTGC 3 00 

CGGCTGGTGC TCTTCATGTT TGCC ATGAAC CGCCAGGGCA GCATCATCTT CCTCACGGTG 36 0 

GTGGCGGTAG ACAGGTATTT CCGGGTGGTC CATCCCCACC ACGCCCTGAA CAAGATCTCC 420 

AATTGGACAG CAG C CAT CAT CTCTTGCCTT CTGTGGGGCA TCACTGTTGG CCTAACAGTC 48 0 

10 CACCTCCTGA AGAAGAAGTT GCTGATCCAG AATGGCCCTG CAAATGTGTG CATCAGCTTC 54 0 

AGCATCTGCC ATACCTTCCG GTGGCACGAA GCTATGTTCC TCCTGGAGTT CCTCCTGCCC 600 

CTGGGCATCA TCCTGTTCTG CTCAGCCAGA ATTATCTGGA GCCTGCGGCA GAGACAAATG 66 0 

GACCGGCATG CCAAGATCAA GAG AGCCATC ACCTTCATCA TGGTGGTGGC CATCGTCTTT 72 0 

GTCATCTGCT TCCTTCCCAG CGTGGTTGTG CGGATCCGCA TCTTCTGGCT CCTGCACACT 780 

15 TCGGGCACGC AGAATTGTGA AGTGTACCGC TCGGTGGACC TGGCGTTCTT TATCACTCTC 84 0 

AGCTTCACCT ACATGAACAG CATGCTGGAC CCCGTGGTGT ACTACTTCTC CAGCCCATCC 900 

TTTCCCAACT TCTTCTCCAC TTTGATCAAC CGCTGCCTCC AGAGGAAGAT GACAGGTGAG 96 0 

CCAGATAATA ACCGCAGCAC GAGCGTCGAG CTCACAGGGG ACCCCAACAA AACCAGAGGC 102 0 

GCTCCAGAGG CGTTAATGGC CAACTCCGGT GAGCCATGGA GCCCCTCTTA TCTGGGCCCA 108 0 

20 ACCTCAAATA ACCATTCCAA GAAGGGACAT TGTCACCAAG AACCAGCATC TCTGGAGAAA 1140 

CAGTTGGGCT GTTGCATCGA GTAA 1164 
(109) INFORMATION FOR SEQ ID NO: 108: 

(i) SEQUENCE CHARACTERISTICS: 
(A) LENGTH: 38 7 amino acids 

25 <B) TYPE : amino acid 

<C) STRANDEDNESS : 
(D) TOPOLOGY: not relevant 

(ii) MOLECULE TYPE: protein 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 108: 
30 Met Asn Arg His His Leu Gin Asp His Phe Leu Glu lie Asp Lys Lys 
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15 10 15 

Asn Cys Cys Val Phe Arg Asp Asp Phe lie Ala Lys Val Leu Pro Pro 
20 25 30 

Val Leu Gly Leu Glu Phe lie Phe Gly Leu Leu Gly Asn Gly Leu Ala 
35 40 45 

Leu Trp lie Phe Cys Phe His Leu Lys Ser Trp Lys Ser Ser Arg lie 
50 55 60 

Phe Leu Phe Asn Leu Ala Val Ala Asp Phe Leu Leu lie lie Cys Leu 
65 70 75 80 

Pro Phe Val Met Asp Tyr Tyr Val Arg Arg Ser Asp Trp Asn Phe Gly 
85 90 95 

Asp lie, Pro Cys Arg Leu Val Leu Phe Met Phe Ala Met Asn Arg Gin 
100 105 no 

Gly Ser lie lie Phe Leu Thr Val Val Ala Val Asp Arg Tyr Phe Arg 
115 120 125 

Val Val His Pro His His Ala Leu Asn Lys lie Ser Asn Trp Thr Ala 
130 135 140 

Ala He He Ser Cys Leu Leu Trp Gly He Thr Val Gly Leu Thr Val 
145 150 155 160 

His Leu Leu Lys Lys Lys Leu Leu lie Gin Asn Gly Pro Ala Asn Val 
165 170 175 

Cys He Ser Phe Ser He Cys His Thr Phe Arg Trp His Glu Ala Met 
180 185 190 

Phe Leu Leu Glu Phe Leu Leu Pro Leu Gly He He Leu Phe Cys Ser 
25 195 200 205 

Ala Arg He He Trp Ser Leu Arg Gin Arg Gin Met Asp Arg His Ala 
210 215 220 

Lys He Lys Arg Ala He Thr Phe He Met Val Val Ala He Val Phe 
225 230 235 240 



15 



20 



30 



35 



Val He Cys Phe Leu Pro Ser Val Val Val Arg He Arg He Phe Trp 
245 250 255 

Leu Leu His Thr Ser Gly Thr Gin Asn Cys Glu Val Tyr Arg Ser Val 
260 265 270 

Asp Leu Ala Phe Phe He Thr Leu Ser Phe Thr Tyr Met Asn Ser Met 
275 280 285 

Leu Asp Pro Val Val Tyr Tyr Phe Ser Ser Pro Ser Phe Pro Asn Phe 
290 295 300 
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Phe Ser Thr Leu lie Asn Arg Cys Leu Gin Arg Lys Met Thr Gly Glu 

305 310 315 320 

Pro Asp Asn Asn Arg Ser Thr Ser Val Glu Leu Thr Gly Asp Pro Asn 
325 330 335 

5 Lys Thr Arg Gly Ala Pro Glu Ala Leu Met Ala Asn Ser Gly Glu Pro 

340 345 350 

Trp Ser Pro Ser Tyr Leu Gly Pro Thr Ser Asn Asn His Ser Lys Lys 
355 360 365 

Gly His Cys His Gin Glu Pro Ala Ser Leu Glu Lys Gin Leu Gly Cys 
10 370 375 380 

Cys lie Glu 
385 

(110) INFORMATION FOR SEQ ID NO: 109: 

(i) SEQUENCE CHARACTERISTICS: 
15 (A) LENGTH: 37 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : single 

(D) - TOPOLOGY : 1 inear 

(ii) MOLECULE TYPE: DNA (genomic) 
20 (iv) ANTI- SENSE: NO 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 109: 
ACCATGGCTT GCAATGGCAG TGCGGCCAGG GGGCACT 3 7 

(111) INFORMATION FOR SEQ ID NO: 110: 

(i) SEQUENCE CHARACTERISTICS: 
25 (A) LENGTH: 3 9 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 
30 (iv) ANTI -SENSE: YES 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 110: 
CGACCAGGAC AAACAGCATC TTGGTCACTT GTCTCCGGC 3 9 

(112) INFORMATION FOR SEQ ID NO: 111: 

(i) SEQUENCE CHARACTERISTICS: 
35 (A) LENGTH: 3 9 base pairs 

(B) TYPE: nucleic acid 
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(C) STRANDEDNESS: single 
<D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 

(iv) ANT I- SENSE: NO 

(xi) SEQUENCE DESCRIPTION : SEQ ID NO: 111: 
GACCAAGATG CTGTTTGTCC TGGTCGTGGT GTTTGGCAT 



(113) INFORMATION FOR SEQ ID NO: 112: 

( i ) SEQUENCE CHARACTERISTICS : 

(A) LENGTH: 35 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 



(114) INFORMATION FOR SEQ ID NO: 113: 



35 



(ii) MOLECULE TYPE: DNA (genomic) 
(iv) ANTI- SENSE: YES 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO:112: 
CGGAATTCAG GATGGATCGG TCTCTTGCTG CGCCT 



( i ) SEQUENCE CHARACTERISTICS : 

(A) LENGTH: 1212 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO:113: 
ATGGCTTGCA ATGGCAGTGC GGCCAGGGGG CACTTTGACC CTGAGGACTT GAACCTGACT 60 

G ACGAGG C AC TGAGACTCAA GTACCTGGGG CCCCAGCAGA CAGAGCTGTT CATGCCCATC 12 0 

TGTGCCACAT ACCTG CTG AT CTTCGTGGTG GGCGCTGTGG GCAATGGGCT GACCTGTCTG 18 0 

GTCATCCTGC GCCACAAGGC CATG CGCACG CCTACCAACT ACTACCTCTT CAGCCTGGCC 24 0 

GTGTCGGACC TGCTGGTGCT GCTGGTGGGC CTGCCCCTGG AGCTCTATGA GATGTGGCAC 3 00 

AACTACCCCT TCCTGCTGGG CGTTGGTGGC TGCTATTTCC GCACGCTACT GTTTGAGATG 3 60 

GTCTGCCTGG CCTCAGTGCT CAACGTCACT GCCCTGAGCG TGGAACGCTA TGTGGCCGTG 42 0 

GTGCACCCAC TCCAGGCCAG GTCCATGGTG ACGCGGGCCC ATGTGCGCCG AGTGCTTGGG 4 80 
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GCCGTCTGGG GTCTTGCCAT GCTCTGCTCC CTGCCCAACA CCAGCCTGCA CGGCATCCGG 54 0 

CAGCTGCACG TGCCCTGCCG GGGCCCAGTG CCAGACTCAG CTGTTTGCAT GCTGGTCCGC 600 

CCACGGGCCC TCTACAACAT GGTAGTGCAG ACCACCGCGC TGCTCTTCTT CTGCCTGCCC 66 0 

ATGGCCATCA TGAGCGTGCT CTACCTGCTC ATTGGGCTGC GACTGCGGCG GGAGAGGCTG 72 0 

5 CTGCTCATGC AGGAGGCCAA GGGCAGGGGC TCTGCAGCAG CCAGGTCCAG ATACACCTGC 78 0 

AGGCTCCAGC AGCACGATCG GGGCCGGAGA CAAGTGACCA AGATGCTGTT TGTCCTGGTC 84 0 

GTGGTGTTTG GCATCTGCTG GGCCCCGTTC CACGCCGACC GCGTCATGTG GAGCGTCGTG 900 

TCACAGTGGA CAGATGGCCT GCACCTGGCC TTCCAGCACG TGCACGTCAT CTCCGGCATC 96 0 

TTCTTCTACC TGGGCTCGGC GGCCAACCCC GTGCTCTATA GCCTCATGTC CAGCCGCTTC 102 0 

0 CGAGAGACCT TCCAGGAGGC CCTGTGCCTC GGGGCCTGCT GCCATCGCCT CAGACCCCGC 108 0 

CACAGCTCCC ACAGCCTCAG CAGGATGACC ACAGGCAGCA CCCTGTGTGA TGTGGGCTCC 114 0 

CTGGGCAGCT GGGTCCACCC CCTGGCTGGG AACGATGGCC CAGAGGCGCA GCAAGAGACC ^12 00 

GATCCATCCT GA 1212 
(115) INFORMATION FOR SEQ ID NO: 114: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 4 03 amino acids 

(B) TYPE: amino acid 

( C ) STRANDEDNES S : 

(D) TOPOLOGY: not relevant 

(ii) MOLECULE TYPE: protein 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO:114: 

Met Ala Cys Asn Gly Ser Ala Ala Arg Gly His Phe Asp Pro Glu Asp 
15 10 15 

Leu Asn Leu Thr Asp Glu Ala Leu Arg Leu Lys Tyr Leu Gly Pro Gin 
20 25 30 

Gin Thr Glu Leu Phe Met Pro lie Cys Ala Thr Tyr Leu Leu lie Phe 
35 40 45 

Val Val Gly Ala Val Gly Asn Gly Leu Thr Cys Leu Val lie Leu Arg 
50 55 60 

His Lys Ala Met Arg Thr Pro Thr Asn Tyr Tyr Leu Phe Ser Leu Ala 
65 70 75 80 

Val Ser Asp Leu Leu Val Leu Leu Val Gly Leu Pro Leu Glu Leu Tyr 
85 90 95 
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Glu Met Trp His Asn Tyr Pro Phe Leu Leu Gly Val Gly Gly Cys Tyr 
100 105 no 

Phe Arg Thr Leu Leu Phe Glu Met Val Cys Leu Ala Ser Val Leu Asn 
115 120 125 

5 Val Thr Ala Leu Ser Val Glu Arg Tyr Val Ala Val Val His Pro Leu 

130 135 140 

Gin Ala Arg Ser Met Val Thr Arg Ala His Val Arg Arg Val Leu Gly 
145 150 155 160 

Ala Val Trp Gly Leu Ala Met Leu Cys Ser Leu Pro Asn Thr Ser Leu 
10 165 170 175 

His Gly lie Arg Gin Leu His Val Pro Cys Arg Gly Pro Val Pro Asp 
180 185 190 

Ser Ala Val Cys Met Leu Val Arg Pro Arg Ala Leu Tyr Asn Met Val 
195 200 205 

15 Val Gin Thr Thr Ala Leu Leu Phe Phe Cys Leu Pro Met Ala lie Met 

210 215 220 

Ser Val Leu Tyr Leu Leu lie Gly Leu Arg Leu Arg Arg Glu Arg Leu 
225 230 235 240 

Leu Leu Met Gin Glu Ala Lys Gly Arg Gly Ser Ala Ala Ala Arg Ser 
20 245 250 255 

Arg Tyr Thr Cys Arg Leu Gin Gin His Asp Arg Gly Arg Arg Gin Val 
260 265 270 

Thr Lys Met Leu Phe Val Leu Val Val Val Phe Gly lie Cys Trp Ala 
275 280 285 

25 Pro Phe His Ala Asp Arg Val Met Trp Ser Val Val Ser Gin Trp Thr 

290 295 300 

Asp Gly Leu His Leu Ala Phe Gin His Val His Val lie Ser Gly He 
305 310 315 320 

Phe Phe Tyr Leu Gly Ser Ala Ala Asn Pro Val Leu Tyr Ser Leu Met 
30 325 330 335 

Ser Ser Arg Phe Arg Glu Thr Phe Gin Glu Ala Leu Cys Leu Gly Ala 
340 345 350 

Cys Cys His Arg Leu Arg Pro Arg His Ser Ser His Ser Leu Ser Arg 
355 360 365 



35 



Met Thr Thr Gly Ser Thr Leu Cys Asp Val Gly Ser Leu Gly Ser Trp 
370 375 380 

Val His Pro Leu Ala Gly Asn Asp Gly Pro Glu Ala Gin Gin Glu Thr 
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385 390 395 400 

Asp Pro Ser 

(116) INFORMATION FOR SEQ ID NO: 115: 

5 (i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 30 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 

10 (ii) MOLECULE TYPE: DNA (genomic) 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 115: 
GGAAGCTTCA GGCCCAAAGA TGGGGAACAT 3 0 

(117) INFORMATION FOR SEQ ID NO: 116: 

(i) SEQUENCE CHARACTERISTICS: 
15 (A) LENGTH: 3 0 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 

20 (xi) SEQUENCE DESCRIPTION: SEQ ID NO: 116: 

GTGGATCCAC CCGCGGAGGA CCCAGGCTAG 30 

(118) INFORMATION FOR SEQ ID NO: 117: 

(i) SEQUENCE CHARACTERISTICS: 
(A) LENGTH: 1098 base pairs 

25 (B) TYPE: nucleic acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE; DNA (genomic) 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO:117: 

30 ATGGGGAACA TCACTGCAGA CAACTCCTCG ATGAG CTGTA CCATCGACCA TACCATCCAC 60 

CAGACGCTGG CCCCGGTGGT CTATGTTACC GTGCTGGTGG TGGGCTTCCC GGCCAACTGC 12 0 

CTGTCCCTCT ACTTCGGCTA CCTGCAGATC AAGGCCCGGA ACG AG CTGGG CGTGTACCTG 180 

TGCAACCTGA CGGTGGCCGA CCTCTTCTAC ATCTGCTCGC TGCCCTTCTG GCTGCAGTAC 24 0 

GTGCTGCAGC ACGACAACTG GTCTCACGGC GACCTGTCCT GCCAGGTGTG CGGCATCCTC 3 00 

35 CTGT AC GAGA ACATCTACAT CAGCGTGGGC TTCCTCTGCT GCATCTCCGT GGACCGCTAC 36 0 
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CTGGCTGTGG CCCATCCCTT CCGCTTCCAC CAGTTCCGGA CCCTGAAGGC GGCCGTCGGC 420 

GTCAGCGTGG TCATCTGGGC CAAGGAGCTG CTGACCAGCA TCTACTTCCT GATGCACGAG 4 80 

GAGGTCATCG AGGACGAGAA CCAGCACCGC GTGTGCTTTG AGCACTACCC CATCCAGGCA 54 0 

TGGCAGCGCG CCATCAACTA CTACCGCTTC CTGGTGGGCT TCCTCTTCCC CATCTGCCTG 6 00 

5 CTGCTGGCGT CCTACCAGGG CATCCTGCGC GCCGTGCGCC GGAGCCACGG C AC CCAGAAG 660 

AGCCGCAAGG ACCAGATCCA GCGG CTGGTG CTCAGCACCG TGGTCATCTT CCTGGCCTGC 72 0 

TTCCTGCCCT ACCACGTGTT GCTGCTGGTG CGCAGCGTCT GGGAGG CCAG CTGCGACTTC 78 0 

GCCAAGGGCG TTTTCAACGC CTACCACTTC TCCCTCCTGC TCACCAGCTT CAACTGCGTC 84 0 

GCCGACCCCG TGCTCTACTG CTTCGTCAGC GAGACCACCC ACCGGGACCT GGCCCGCCTC 900 

10 CGCGGGGCCT GCCTGGCCTT CCTCACCTGC TCCAGGACCG GCCGGGCCAG GGAGGCCTAC 96 0 

CCGCTGGGTG CCCCCGAGGC CTCCGGGAAA AGCGGGGCCC AGGGTGAGGA GCCCGAGCTG 102 0 

TTGACCAAGC TCCACCCGGC CTTCCAGACC CCTAACTCGC CAGGGTCGGG CGGGTTCCCC 108 0 

ACGGGCAGGT TGGCCTAG 10 98 
(119) INFORMATION FOR SEQ ID NO: 118: 

15 (i) SEQUENCE CHARACTERISTICS : 

(A) LENGTH: 365 amino acids 

(B) TYPE: amino acid 
( C ) STRANDEDNES S : 

(D) TOPOLOGY: not relevant 

20 (ii) MOLECULE TYPE: protein 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO:118: 

Met Gly Asn lie Thr Ala Asp Asn Ser Ser Met Ser Cys Thr lie Asp 
15 10 15 

His Thr lie His Gin Thr Leu Ala Pro Val Val Tyr Val Thr Val Leu 
25 20 25 30 

Val Val Gly Phe Pro Ala Asn Cys Leu Ser Leu Tyr Phe Gly Tyr Leu 
35 40 45 

Gin lie Lys Ala Arg Asn Glu Leu Gly Val Tyr Leu Cys Asn Leu Thr 
50 55 60 

30 Val Ala Asp Leu Phe Tyr lie Cys Ser Leu Pro Phe Trp Leu Gin Tyr 

65 70 75 80 

Val Leu Gin His Asp Asn Trp Ser His Gly Asp Leu Ser Cys Gin Val 
85 90 95 
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Cys Gly lie Leu Leu Tyr Glu Asn lie Tyr lie Ser Val Gly Phe Leu 
100 105 no 

Cys Cys He Ser Val Asp Arg Tyr Leu Ala Val Ala His Pro Phe Arg 
115 120 125 

5 Phe His Gin Phe Arg Thr Leu Lys Ala Ala Val Gly Val Ser Val Val 

130 135 140 

He Trp Ala Lys Glu Leu Leu Thr Ser He Tyr Phe Leu Met His Glu 
145 150 155 160 

Glu Val lie Glu Asp Glu Asn Gin His Arg Val Cys Phe Glu His Tyr 
10 165 170 175 

Pro He Gin Ala Trp Gin Arg Ala He Asn Tyr Tyr Arg Phe Leu Val 
180 185 190 

Gly Phe Leu Phe Pro He Cys Leu Leu Leu Ala Ser Tyr Gin Gly He 
195 200 205 

15 Leu Arg Ala Val Arg Arg Ser His Gly Thr Gin Lys Ser Arg Lys Asp 

210 215 220 

Gin He Gin Arg Leu Val Leu Ser Thr Val Val He Phe Leu Ala Cys 
225 230 235 240 

Phe Leu Pro Tyr His Val Leu Leu Leu Val Arg Ser Val Trp Glu Ala 
20 245 250 255 

Ser Cys Asp Phe Ala Lys Gly Val Phe Asn Ala Tyr His Phe Ser Leu 
260 265 270 

Leu Leu Thr Ser Phe Asn Cys Val Ala Asp Pro Val Leu Tyr Cys Phe 
275 280 285 

25 Val Ser Glu Thr Thr His Arg Asp Leu Ala Arg Leu Arg Gly Ala Cys 

290 295 300 

Leu Ala Phe Leu Thr Cys Ser Arg Thr Gly Arg Ala Arg Glu Ala Tyr 
305 310 315 320 

Pro Leu Gly Ala Pro Glu Ala Ser Gly Lys Ser Gly Ala Gin Gly Glu 
30 325 330 335 

Glu Pro Glu Leu Leu Thr Lys Leu His Pro Ala Phe Gin Thr Pro Asn 
340 345 350 

Ser Pro Gly Ser Gly Gly Phe Pro Thr Gly Arg Leu Ala 
355 360 365 

35 (12 0) INFORMATION FOR SEQ ID NO: 119: 

(i) SEQUENCE CHARACTERISTICS: 
(A) LENGTH: 26 base pairs 
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(B) TYPE: nucleic acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 
(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 119: 
GACCTCGAGT CCTTCTACAC CTCATC 26 

(121) INFORMATION FOR SEQ ID NO: 12 0: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 30 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 120: 
TGCTCTAGAT TCCAGATAGG TGAAAACTTG 30 

(122) INFORMATION FOR SEQ ID NO: 121: 

(i) SEQUENCE CHARACTERISTICS : 

(A) LENGTH: 1416 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

( D ) TOPOLOGY : 1 inear 

(ii) MOLECULE TYPE: DNA (genomic) 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 121: 

ATGGATATTC TTTGTGAAGA AAATACTTCT TTGAGCTCAA CTACGAACTC CCTAATGCAA 6 0 

TTAAATGATG ACAACAGGCT CTACAGTAAT GACTTTAACT CCGGAGAAGC TAACACTTCT 12 0 

GATGCATTTA ACTGGACAGT CGACTCTGAA AATCGAACCA ACCTTTCCTG TGAAGGGTGC 180 

CTCTCACCGT CGTGTCTCTC CTTACTTCAT CTCCAGGAAA AAAACTGGTC TGCTTTACTG 24 0 

AC AG CCGTAG TGATTATTCT AACTATTGCT GGAAACATAC TCGTCATCAT GGCAGTGTCC 3 00 

CTAGAGAAAA AGCTGCAGAA TGCCACCAAC TATTTCCTGA TGTCACTTGC CATAGCTGAT 36 0 

ATGCTGCTGG GTTTCCTTGT CATGCCCGTG TCCATGTTAA CCATCCTGTA TGGGTACCGG 42 0 

TGGCCTCTGC CGAGCAAGCT TTGTG CAGTC TGGATTTACC TGGACGTGCT CTTCTCCACG 4 80 

GCCTCCATCA TGCACCTCTG CGCCATCTCG CTGGACCGCT ACGTCGCCAT CCAGAATCCC 54 0 

ATCCACCACA GCCGCTTCAA CTCCAGAACT AAGGCATTTC TGAAAATCAT TGCTGTTTGG 6 00 
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ACCATATCAG TAGGTATATC CATGCCAATA CCAGTCTTTG GGCTACAGGA CGATTCGAAG 660 

GTCTTTAAGG AGGGGAGTTG CTTACTCGCC GATGATAACT TTGTCCTGAT CGGCTCTTTT 72 0 

GTGTCATTTT TCATTCCCTT AACCATCATG GTGATCACCT ACTTTCTAAC TATCAAGTCA 780 

CTCCAGAAAG AAGCTACTTT GTGTGTAAGT GATCTTGGCA CACGGGCCAA ATTAGCTTCT 84 0 

TTCAGCTTCC TCCCTCAGAG TTCTTTGTCT TCAGAAAAGC TCTTCCAGCG GTCGATCCAT 900 

AGGGAGCCAG GGTCCTACAC AGGCAGGAGG ACTATGCAGT CCATCAGCAA TGAGCAAAAG 96 0 

GCATGCAAGG TGCTGGGCAT CGTCTTCTTC CTGTTTGTGG TGATGTGGTG CCCTTTCTTC 102 0 

ATCACAAACA TCATGGCCGT CATCTGCAAA GAGTCCTGCA ATGAGGATGT CATTGGGGCC 1080 

CTGCTCAATG TGTTTGTTTG GATCGGTTAT CTCTCTTCAG CAGTCAACCC ACTAGTCTAC 114 0 

ACACTGTTCA ACAAGACCTA TAGGTCAGCC TTTTCACGGT ATATTCAGTG TC AG TACAAG 12 00 

GAAAACAAAA AACCATTGCA G TTAATTTT A GTGAACACAA TACCGGCTTT GGCC TACAAG 12 6 0 

TCTAGCCAAC TTCAAATGGG ACAAAAAAAG AATTCAAAGC AAGATGCCAA GACAACAGAT 13 2 0 

AATGACTGCT CAATGGTTGC TCTAGGAAAG CAGTATTCTG AAGAGGCTTC TAAAGACAAT 13 80 

AGCGACGGAG TGAATGAAAA GGTGAGCTGT GTGTGA 1416 
(123) INFORMATION FOR SEQ ID NO: 122: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 471 amino acids 

(B) TYPE: amino acid 

(C) STRANDEDNESS : 

(D) TOPOLOGY: not relevant 

(ii) MOLECULE TYPE: protein 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO:122: 

Met Asp lie Leu Cys Glu Glu Asn Thr Ser Leu Ser Ser Thr Thr Asn 
15 io 15 

Ser Leu Met Gin Leu Asn Asp Asp Asn Arg Leu Tyr Ser Asn Asp Phe 
20 25 30 

Asn Ser Gly Glu Ala Asn Thr Ser Asp Ala Phe Asn Trp Thr Val Asp 
35 40 45 

Ser Glu Asn Arg Thr Asn Leu Ser Cys Glu Gly Cys Leu Ser Pro Ser 
50 55 60 

Cys Leu Ser Leu Leu His Leu Gin Glu Lys Asn Trp Ser Ala Leu Leu 
65 70 75 80 
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Thr Ala Val Val lie lie Leu Thr lie Ala Gly Asn lie Leu Val lie 
85 90 95 

Met Ala Val Ser Leu Glu Lys Lys Leu Gin Asn Ala Thr Asn Tyr Phe 
100 105 no 

5 Leu Met Ser Leu Ala lie Ala Asp Met Leu Leu Gly Phe Leu Val Met 

115 120 125 

Pro Val Ser Met Leu Thr lie Leu Tyr Gly Tyr Arg Trp Pro Leu Pro 
130 135 140 

Ser Lys Leu Cys Ala Val Trp lie Tyr Leu Asp Val Leu Phe Ser Thr 
10 145 150 155 160 

Ala Ser lie Met His Leu Cys Ala lie Ser Leu Asp Arg Tyr Val Ala 
165 170 175 

lie Gin Asn Pro lie His His Ser Arg Phe Asn Ser Arg Thr Lys Ala 
180 185 190 

15 Phe Leu Lys lie He Ala Val Trp Thr He Ser Val Gly He Ser Met 

195 200 205 

Pro He Pro Val Phe Gly Leu Gin Asp Asp Ser Lys Val Phe Lys Glu 
210 215 220 

Gly Ser Cys Leu Leu Ala Asp Asp Asn Phe Val Leu He Gly Ser Phe 
20 225 230 235 240 

Val Ser Phe Phe He Pro Leu Thr He Met Val He Thr Tyr Phe Leu 
245 250 255 

Thr He Lys Ser Leu Gin Lys Glu Ala Thr Leu Cys Val Ser Asp Leu 
260 265 270 

25 Gly Thr Arg Ala Lys Leu Ala Ser Phe Ser Phe Leu Pro Gin Ser Ser 

275 280 285 

Leu Ser Ser Glu Lys Leu Phe Gin Arg Ser He His Arg Glu Pro Gly 
290 295 300 

Ser Tyr Thr Gly Arg Arg Thr Met Gin Ser He Ser Asn Glu Gin Lys 
30 305 310 315 320 

Ala Cys Lys Val Leu Gly He Val Phe Phe Leu Phe Val Val Met Trp 
325 330 335 

Cys Pro Phe Phe He Thr Asn He Met Ala Val He Cys Lys Glu Ser 
340 345 350 

35 Cys Asn Glu Asp Val He Gly Ala Leu Leu Asn Val Phe Val Trp He 

355 360 365 

Gly Tyr Leu Ser Ser Ala Val Asn Pro Leu Val Tyr Thr Leu Phe Asn 
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370 375 380 

Lys Thr Tyr Arg Ser Ala Phe Ser Arg Tyr He Gin Cys Gin Tyr Lys 
385 390 395 400 

Glu Asn Lys Lys Pro Leu Gin Leu lie Leu Val Asn Thr He Pro Ala 
5 405 410 415 

Leu Ala Tyr Lys Ser Ser Gin Leu Gin Met Gly Gin Lys Lys Asn Ser 
420 425 430 

Lys Gin Asp Ala Lys Thr Thr Asp Asn Asp Cys Ser Met Val Ala Leu 
435 440 445 

10 Gly Lys Gin Tyr Ser Glu Glu Ala Ser Lys Asp Asn Ser Asp Gly Val 

450 455 460 

Asn Glu Lys Val Ser Cys Val 
465 470 

(124) INFORMATION FOR SEQ ID NO: 123; 

15 (i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 2 7 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY : linear 

20 (ii) MOLECULE TYPE: DNA (genomic) 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO:123: 
GACCTCGAGG TTG CTTAAG A CTGAAGC 2 7 

(12 5) INFORMATION FOR SEQ ID NO: 124: 

(i) SEQUENCE CHARACTERISTICS: 
25 (A) LENGTH: 2 7 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 

30 (xi) SEQUENCE DESCRIPTION: SEQ ID NO:124: 

ATTTCTAGAC ATATGTAGCT TGTACCG 2 7 

(126) INFORMATION FOR SEQ ID NO: 12 5: 

(i) SEQUENCE CHARACTERISTICS : 

(A) LENGTH: 1377 base pairs 
35 (B) TYPE: nucleic acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 
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(ii) MOLECULE TYPE: DNA (genomic) 
(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 12 5: 

ATGGTGAACC TGAGGAATGC GGTGCATTCA TTCCTTGTGC ACCTAATTGG CCTATTGGTT 6 0 

TGGCAATGTG ATATTTCTGT GAGCCCAGTA GCAGCTATAG TAACTGACAT TTTCAATACC 12 0 

5 TCCGATGGTG GACGCTTCAA ATTCCCAGAC GGGGTACAAA ACTGGCCAGC ACTTTCAATC 180 

GTCATCATAA TAATCATGAC AATAGGTGGC AACATCCTTG TGATCATGGC AGTAAGCATG 24 0 

GAAAAGAAAC TGCACAATGC CACCAATTAC TTCTTAATGT CCCTAGCCAT TGCTGATATG 3 00 

CTAGTGGGAC TACTTGTCAT GCCCCTGTCT CTCCTGGCAA TCCTTTATGA TTATGTCTGG 36 0 

CCACTACCTA GATATTTGTG CCCCGTCTGG ATTTCTTTAG ATGTTTTATT TTCAACAGCG 42 0 

10 TCCATCATGC ACCTCTGCGC TATATCGCTG GATCGGTATG TAGCAATACG TAATCCTATT 4 80 

GAGCATAGCC GTTTCAATTC GCGGACTAAG GCCATCATGA AGATTGCTAT TGTTTGGGCA 54 0 

ATTTCTATAG GTGTATCAGT TCCTATCCCT GTGATTGGAC TGAGGGACGA AGAAAAGGTG 6 00 

TTCGTGAACA ACACGACGTG CGTGCTCAAC GACCCAAATT TCGTTCTTAT TGGGTCCTTC 66 0 

GTAGCTTTCT TCATACCGCT GACGATTATG GTGATTACGT ATTGCCTGAC CATCTACGTT 72 0 

15 CTGCGCCGAC AAGCTTTGAT GTTACTG CAC GGCCACACCG AGGAACCGCC TGGACTAAGT 780 

CTGGATTTCC TGAAGTGCTG CAAGAGGAAT ACGGCCGAGG AAGAGAACTC TGCAAACCCT 84 0 

AACCAAGACC AGAACGCACG CCGAAGAAAG AAGAAGGAGA GACGTCCTAG GGGCACCATG 900 

CAGGCTATCA ACAATGAAAG AAAAGCTTCG AAAGTCCTTG GGATTGTTTT CTTTGTGTTT 96 0 

CTGATCATGT GGTGCCCATT TTTCATTACC AATATTCTGT CTGTTCTTTG TGAGAAGTCC 102 0 

20 TGTAACCAAA AGCTCATGGA AAAGCTTCTG AATGTGTTTG TTTGGATTGG CTATGTTTGT 108 0 

TCAGGAATCA ATCCTCTGGT GTATACTCTG TTCAACAAAA TTTACCGAAG GGCATTCTCC 114 0 

AACTATTTGC GTTGCAATTA TAAGGTAGAG AAAAAGCCTC CTGTCAGGCA GATTCCAAGA 12 00 

GTTGCCGCCA CTGCTTTGTC TGGGAGGGAG CTTAATGTTA ACATTTATCG GCATACCAAT 126 0 

GAACCGGTGA TCGAGAAAGC CAGTGACAAT GAGCCCGGTA T AG AG ATG C A AGTTGAGAAT 1320 

25 TTAGAGTTAC CAGTAAATCC CTCCAGTGTG GTTAGCGAAA GGATTAGCAG TGTGTGA 13 77 
(127) INFORMATION FOR SEQ ID NO: 126: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 458 amino acids 

(B) TYPE: amino acid 
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<C) STRANDEDNESS : 

(D) TOPOLOGY: not relevant 

(ii) MOLECULE TYPE: protein 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 126: 

Met Val Asn Leu Arg Asn Ala Val His Ser Phe Leu Val His Leu lie 
15 10 15 

Gly Leu Leu Val Trp Gin Cys Asp lie Ser Val Ser Pro Val Ala Ala 
20 25 30 

lie Val Thr Asp lie Phe Asn Thr Ser Asp Gly Gly Arg Phe Lys Phe 
35 40 45 

Pro Asp Gly Val Gin Asn Trp Pro Ala Leu Ser lie Val lie lie lie 
50 55 60 

lie Met Thr lie Gly Gly Asn lie Leu Val lie Met Ala Val Ser Met 
65 70 75 80 

Glu Lys Lys Leu His Asn Ala Thr Asn Tyr Phe Leu Met Ser Leu Ala 
85 90 95 

lie Ala Asp Met Leu Val Gly Leu Leu Val Met Pro Leu Ser Leu Leu 
100 105 110 

Ala lie Leu Tyr Asp Tyr Val Trp Pro Leu Pro Arg Tyr Leu Cys Pro 
115 120 125 

Val Trp He Ser Leu Asp Val Leu Phe Ser Thr Ala Ser He Met His 
130 135 140 

Leu Cys Ala He Ser Leu Asp Arg Tyr Val Ala He Arg Asn Pro He 
145 150 155 160 

Glu His Ser Arg Phe Asn Ser Arg Thr Lys Ala He Met Lys He Ala 
165 170 175 

He Val Trp Ala He Ser He Gly Val Ser Val Pro He Pro Val He 
180 185 190 

Gly Leu Arg Asp Glu Glu Lys Val Phe Val Asn Asn Thr Thr Cys Val 
195 200 205 

Leu Asn Asp Pro Asn Phe Val Leu He Gly Ser Phe Val Ala Phe Phe 
210 215 220 

He Pro Leu Thr He Met Val He Thr Tyr Cys Leu Thr He Tyr Val 
225 230 235 240 



Leu Arg Arg Gin Ala Leu Met Leu Leu His Gly His Thr Glu Glu Pro 
245 250 255 
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Pro Gly Leu Ser Leu Asp Phe Leu Lys Cys Cys Lys Arg Asn Thr Ala 
260 265 270 

Glu Glu Glu Asn Ser Ala Asn Pro Asn Gin Asp Gin Asn Ala Arg Arg 
275 280 285 

5 Arg Lys Lys Lys Glu Arg Arg Pro Arg Gly Thr Met Gin Ala lie Asn 

290 295 300 

Asn Glu Arg Lys Ala Ser Lys Val Leu Gly lie Val Phe Phe Val Phe 
305 310 315 320 

Leu lie Met Trp Cys Pro Phe Phe lie Thr Asn lie Leu Ser Val Leu 
10 325 330 335 

Cys Glu Lys Ser Cys Asn Gin Lys Leu Met Glu Lys Leu Leu Asn Val 
340 345 350 

Phe Val Trp lie Gly Tyr Val Cys Ser Gly lie Asn Pro Leu Val Tyr 
355 360 365 

15 Thr Leu Phe Asn Lys lie Tyr Arg Arg Ala Phe Ser Asn Tyr Leu Arg 

370 375 380 

Cys Asn Tyr Lys Val Glu Lys Lys Pro Pro Val Arg Gin lie Pro Arg 
385 390 395 400 

Val Ala Ala Thr Ala Leu Ser Gly Arg Glu Leu Asn Val Asn lie Tyr 
20 405 410 415 

Arg His Thr Asn Glu Pro Val lie Glu Lys Ala Ser Asp Asn Glu Pro 
420 425 430 

Gly lie Glu Met Gin Val Glu Asn Leu Glu Leu Pro Val Asn Pro Ser 
435 440 445 

25 Ser Val Val Ser Glu Arg lie Ser Ser Val 

450 455 

(12 8) INFORMATION FOR SEQ ID NO: 127: 

(i) SEQUENCE CHARACTERISTICS: 
(A) LENGTH: 30 base pairs 

30 (B) TYPE: nucleic acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 127: 
35 GGTAAGCTTG GCAGTCCACG CCAGGCCTTC 3 0 

(12 9) INFORMATION FOR SEQ ID NO: 12 8: 
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( i ) SEQUENCE CHARACTERISTICS : 

(A) LENGTH: 30 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : single 
5 (D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO:128: 
TCCGAATTCT CTGTAGACAC AAGGCTTTGG 30 
(130) INFORMATION FOR SEQ ID NO: 129: 

10 (i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 1068 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

15 (ii) MOLECULE TYPE : DNA (genomic) 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 12 9: 
ATGGATCAGT TCCCTGAATC AGTGACAGAA AACTTTG AG T ACGATGATTT GGCTGAGGCC 6 0 

TGTTATATTG GGGACATCGT GGTCTTTGGG ACTGTGTTCC TGTCCATATT CTACTCCGTC 12 0 

ATCTTTGCCA TTGGCCTGGT GGGAAATTTG TTGGTAGTGT TTGCCCTCAC CAACAGCAAG 18 0 

20 AAGCCCAAGA GTGTCACCGA CATTTACCTC CTGAACCTGG CCTTGTCTGA TCTGCTGTTT 24 0 

GTAGCCACTT TGCCCTTCTG GACTCACTAT TTGATAAATG AAAAGGGCCT CCACAATGCC 3 00 

ATGTGCAAAT TCACTACCGC CTTCTTCTTC ATCGG CTTTT TTGGAAGCAT ATTCTTCATC 3 60 

ACCGTCATCA GCATTGATAG GTACCTGGCC ATCGTCCTGG CCGCCAACTC CATGAACAAC 420 

CGGACCGTGC AGCATGGCGT CACCATCAGC CTAGGCGTCT GGGCAGCAGC CATTTTGGTG 4 80 

25 GCAGCACCCC AGTTCATGTT CACAAAGCAG AAAGAAAATG AATGCCTTGG TGACTACCCC 54 0 

GAGGTCCTCC AGGAAATCTG GCCCGTGCTC CGCAATGTGG AAACAAATTT TCTTGGCTTC 6 00 

CTACTCCCCC TGCTCATTAT GAGTTATTGC TACTTCAGAA TC AT CC AG AC GCTGTTTTCC 660 

TGCAAGAACC ACAAGAAAGC CAAAGCCATT AAACTGATCC TTCTGGTGGT CATCGTGTTT 72 0 

TTCCTCTTCT GGACACCCTA CAACGTTATG ATTTTC CTGG AGACGCTTAA GCTCTATGAC 780 

30 TTCTTTCCCA GTTGTGACAT GAGGAAGGAT CTGAGGCTGG CCCTCAGTGT GACTGAGACG 84 0 

GTTGCATTTA G CC ATTGTTG CCTGAATCCT CTCATCTATG CATTTGCTGG GGAGAAGTTC 900 

AGAAGATACC TTTACCACCT GTATGGGAAA TGCCTGGCTG TCCTGTGTGG GCGCTCAGTC 96 0 
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CACGTTGATT TCTCCTCATC TGAATCACAA AGGAGCAGGC ATGGAAGTGT TCTGAGCAGC 102 0 

AATTTTACTT ACCACACGAG TGATGGAGAT GCATTGCTCC TTCTCTGA 106 8 

(131) INFORMATION FOR SEQ ID NO: 13 0: 

(i) SEQUENCE CHARACTERISTICS: 

5 (A) LENGTH: 3 55 amino acids 

(B) TYPE: amino acid 

(C) STRANDEDNESS : 

(D) TOPOLOGY: not relevant 

(ii) MOLECULE TYPE: protein 

10 (xi) SEQUENCE DESCRIPTION: SEQ ID NO: 13 0: 

Met Asp Gin Phe Pro Glu Ser Val Thr Glu Asn Phe Glu Tyr Asp Asp 
15 10 15 

Leu Ala Glu Ala Cys Tyr lie Gly Asp lie Val Val Phe Gly Thr Val 
20 25 30 

15 Phe Leu Ser lie Phe Tyr Ser Val lie Phe Ala lie Gly Leu Val Gly 

35 40 45 

Asn Leu Leu Val Val Phe Ala Leu Thr Asn Ser Lys Lys Pro Lys Ser 
50 55 60 

Val Thr Asp lie Tyr Leu Leu Asn Leu Ala Leu Ser Asp Leu Leu Phe 
20 65 70 75 80 

Val Ala Thr Leu Pro Phe Trp Thr His Tyr Leu lie Asn Glu Lys Gly 
85 90 95 

Leu His Asn Ala Met Cys Lys Phe Thr Thr Ala Phe Phe Phe lie Gly 
100 105 110 

25 Phe Phe Gly Ser lie Phe Phe lie Thr Val lie Ser lie Asp Arg Tyr 

115 120 125 

Leu Ala lie Val Leu Ala Ala Asn Ser Met Asn Asn Arg Thr Val Gin 
130 135 140 

His Gly Val Thr lie Ser Leu Gly Val Trp Ala Ala Ala lie Leu Val 
30 145 150 155 160 

Ala Ala Pro Gin Phe Met Phe Thr Lys Gin Lys Glu Asn Glu Cys Leu 
165 170 175 

Gly Asp Tyr Pro Glu Val Leu Gin Glu lie Trp Pro Val Leu Arg Asn 
180 185 190 

35 Val Glu Thr Asn Phe Leu Gly Phe Leu Leu Pro Leu Leu lie Met Ser 

195 200 205 



BNSDOCID: <WO 0022 129 A 1 I > 



WO 00/22129 



PCT/US99/23938 



103 

Tyr Cys Tyr Phe Arg lie lie Gin Thr Leu Phe Ser Cys Lys Asn His 
210 215 220 

Lys Lys Ala Lys Ala lie Lys Leu lie Leu Leu Val Val lie Val Phe 
225 230 235 240 

5 Phe Leu Phe Trp Thr Pro Tyr Asn Val Met lie Phe Leu Glu Thr Leu 

245 250 255 

Lys Leu Tyr Asp Phe Phe Pro Ser Cys Asp Met Arg Lys Asp Leu Arg 
260 265 270 

Leu Ala Leu Ser Val Thr Glu Thr Val Ala Phe Ser His Cys Cys Leu 
10 275 280 285 

Asn Pro Leu lie Tyr Ala Phe Ala Gly Glu Lys Phe Arg Arg Tyr Leu 
290 295 300 

Tyr His Leu Tyr Gly Lys Cys Leu Ala Val Leu Cys Gly Arg Ser Val 
305 310 315 320 

15 His Val Asp Phe Ser Ser Ser Glu Ser Gin Arg Ser Arg His Gly Ser 

325 330 335 

Val Leu Ser Ser Asn Phe Thr Tyr His Thr Ser Asp Gly Asp Ala Leu 
340 345 350 

Leu Leu Leu 
20 355 

(13 2) INFORMATION FOR SEQ ID NO: 131: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 32 base pairs 

(B) TYPE: nucleic acid 
25 (C) STRANDEDNESS : single 

( D ) TOPOLOGY : linear 

(ii) MOLECULE TYPE: DNA (genomic) 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 131: 
GATCTCCAGT AGGCATAAGT GGACAATTCT GG 3 2 

30 (133) INFORMATION FOR SEQ ID NO: 132: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 3 0 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 
35 (D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO:l32: 
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CTCCTTCGGT CCTCCTATCG TTGTCAGAAG 3C 

(134) INFORMATION FOR SEQ ID NO: 133: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 30 base pairs 

5 (B) TYPE: nucleic acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY : linear 

(ii) MOLECULE TYPE: DNA (genomic) 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 13 3: 
10 AGAAGGCCAA GATCGCGCGG CTGGCCCTCA 3 0 

(135) INFORMATION FOR SEQ ID NO: 134: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 3 0 base pairs 

(B) TYPE: nucleic acid 
15 (C) STRANDEDNESS : single 

( D ) TOPOLOGY : 1 inear 

(ii) MOLECULE TYPE: DNA (genomic) 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO:134: 
CGGCGCCACC GCACGAAAAA GCTCATCTTC 3 0 

20 (136) INFORMATION FOR SEQ ID NO: 13 5: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 3 3 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 
25 (D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 13 5: 
GCCAAGAAGC GGGTGAAGTT CCTGGTGGTG GCA 33 
(13 7) INFORMATION FOR SEQ ID NO: 136: 

30 (i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 30 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

35 (ii) MOLECULE TYPE: DNA (genomic) 
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(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 136: 
CAGGCGGAAG GTGAAAGTCC TGGTCCTCGT 3C 
(13 8) INFORMATION FOR SEQ ID NO: 13 7: 

(i) SEQUENCE CHARACTERISTICS: 

. 5 (A) LENGTH: 33 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE : DNA (genomic) 

10 (xi) SEQUENCE DESCRIPTION: SEQ ID NO: 13 7: 

CGGCGCCTGC GGGCCAAGCG GCTGGTGGTG GTG 33 
(139) INFORMATION FOR SEQ ID NO: 138: 

( i ) S EQUENCE CHARACTER I ST I CS : 

(A) LENGTH: 31 base pairs . . 

15 (B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

( D ) TOPOLOGY : 1 inear 

(ii) MOLECULE TYPE: DNA (genomic) 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 13 8: 
20 CCAAGCACAA AGCCAAGAAA GTGACCATCA C 31 
(14 0) INFORMATION FOR SEQ ID NO: 13 9: 

( i ) SEQUENCE CHARACTERISTICS : 

(A) LENGTH: 3 0 base pairs 

(B) TYPE: nucleic acid 
25 (C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 
(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 13 9: 
GCGCCGGCGC ACCAAATGCT TGCTGGTGGT 3 0 

30 (141) INFORMATION FOR SEQ ID NO: 140: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 41 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 
35 (D) TOPOLOGY: linear 
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(ii) MOLECULE TYPE: DNA (genomic) 
(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 14 0 
CAAAAAGCTG AAGAAATCTA AGAAGATCAT CTTTATTGTC G 
(142) INFORMATION FOR SEQ ID NO: 141; 

5 (i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 3 0 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : single 
<D) TOPOLOGY: linear 

10 (ii) MOLECULE TYPE: DNA (genomic) 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 141 

CAAGACCAAG GCAAAACGCA TGATCGCCAT 

(14 3) INFORMATION FOR SEQ ID NO: 142: 

(i) SEQUENCE CHARACTERISTICS: 
15 (A) LENGTH: 3 0 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 

20 (xi) SEQUENCE DESCRIPTION: SEQ ID NO: 142: 

GTCAAGGAGA AGTCCAAAAG GATCATCATC 
(144) INFORMATION FOR SEQ ID NO: 143: 

(i) SEQUENCE CHARACTERISTICS: 
(A) LENGTH: 30 base pairs 

25 (B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 143: 
30 CGCCGCGTGC GGGCCAAGCA GCTCCTGCTC 

(14 5) INFORMATION FOR SEQ ID NO: 144: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 3 3 base pairs 

(B) TYPE: nucleic acid 
35 (C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 
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(ii) MOLECULE TYPE: DNA (genomic) 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 144: 
CCTGATAAGC GCTATAAAAT GGTCCTGTTT CGA 33 
(14 6) INFORMATION FOR SEQ ID NO: 145: 

5 (i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 36 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : single 

( D ) TOPOLOGY : linear 

10 (ii) MOLECULE TYPE: DNA (genomic) 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 145: 
GAAAGACAAA AGAGAGTCAA GAGGATGTCT TTATTG 3 6 

(14 7) INFORMATION FOR SEQ ID NO: 146: 

(i) SEQUENCE CHARACTERISTICS: 
15 (A) LENGTH: 33 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 

20 (xi) SEQUENCE DESCRIPTION: SEQ ID NO: 146: 

CGGAGAAAGA GGGTGAAACG CACAGCCATC GCC 33 
(148) INFORMATION FOR SEQ ID NO: 147: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 30 base pairs 

25 (B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 14 7: 
30 AAGCTTCAGC GGGCCAAGGC ACTGGTCACC 30 
(14 9) INFORMATION FOR SEQ ID NO: 148: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 30 base pairs 

(B) TYPE: nucleic acid 
35 (C) STRANDEDNESS: single 

<D) TOPOLOGY: linear 
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(ii) MOLECULE TYPE: DNA (genomic) 
(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 148: 
CAGCGGCAGA AGGCAAAAAG GGTGGCCATC 3 0 

(150) INFORMATION FOR SEQ ID NO: 14 9: 

5 (i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 30 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 

10 ( i i ) MOLECULE TYPE : DNA ( genomi c ) 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 14 9: 
CGGCAGAAGG CGAAGCGCAT GATCCTCGCG 3 0 

(151) INFORMATION FOR SEQ ID NO: 150: 

(i) SEQUENCE CHARACTERISTICS: 
15 (A) LENGTH: 3 0 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE : DNA (genomic) 

20 (xi) SEQUENCE DESCRIPTION: SEQ ID NO: 150: 

GAGCGCAACA AGGCCAAAAA GGTGATCATC 3 0 

(152) INFORMATION FOR SEQ ID NO: 151: 

(i) SEQUENCE CHARACTERISTICS: 
(A) LENGTH: 3 9 base pairs 

25 (B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 151: 
30 GGTGTAAACA AAAAGG CTAA AAACACAATT ATTCTTATT 3 9 

(153) INFORMATION FOR SEQ ID NO:152: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 27 base pairs 

(B) TYPE: nucleic acid 
35 (C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 
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(ii) MOLECULE TYPE: DNA (genomic) 
(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 152: 
GAGAGCCAGC TCAAGAGCAC CGTGGTG 2 7 

(154) INFORMATION FOR SEQ ID NO: 153: 

5 (i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 3 0 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 

10 (ii) MOLECULE TYPE: DNA (genomic) 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 153: 
CCACAAGCAA ACCAAGAAAA TGCTGGCTGT 30 

(155) INFORMATION FOR SEQ ID NO: 154: 

(i) SEQUENCE CHARACTERISTICS: 
15 (A) LENGTH: 30 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 

20 (xi) SEQUENCE DESCRIPTION: SEQ ID NO: 154: 

CATCAAGTGT ATCATGTGCC AAGTACGCCC 30 

(156) INFORMATION FOR SEQ ID NO: 155: 

(i) SEQUENCE CHARACTERISTICS: 
(A) LENGTH: 34 base pairs 

25 (B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 155: 
30 CTAGAGAGTC AGATGAAGTG TACAGTAGTG GCAC 

(157) INFORMATION FOR SEQ ID NO: 156: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 36 base pairs 

(B) TYPE: nucleic acid 
35 (C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 



34 
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(ii) MOLECULE TYPE: DNA (genomic) 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 156: 
CGGACAAAAG TGAAAACTAA AAAGATGTTC CTCATT 36 
(15 8) INFORMATION FOR SEQ ID NO: 157: 

5 (i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 3 3 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 

10 (ii) MOLECULE TYPE: DNA (genomic) 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 157: 
GCTGAGGTTC GCAATAAACT AACCATGTTT GTG 33 
(15 9) INFORMATION FOR SEQ ID NO: 158: 

(i) SEQUENCE CHARACTERISTICS: 
15 (A) LENGTH: 2 9 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 

20 (xi) SEQUENCE DESCRIPTION: SEQ ID NO: 158: 

GGGAGGCCGA GCTGAAAGCC ACCCTGCTC 2 9 

(16 0) INFORMATION FOR SEQ ID NO: 159: 

(i) SEQUENCE CHARACTERISTICS: 
(A) LENGTH: 31 base pairs 

25 (B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 159: 
30 CAAGATCAAG AGAGCCAAAA CCTTCATCAT G 31 
(161) INFORMATION FOR SEQ ID NO: 16 0: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 31 base pairs 

(B) TYPE: nucleic acid 
35 (C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 



BNSDOCID: <WO 0Q22129A1 J_> 



WO 00/22 1 29 PCT/US99/23938 

111 

(ii) MOLECULE TYPE: DNA (genomic) 
(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 16 0: 
CCGGAGACAA GTGAAGAAGA TGCTGTTTGT C 31 

(162) INFORMATION FOR SEQ ID NO: 161: 

5 (i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 30 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY : linear 

10 (ii) MOLECULE TYPE: DNA (genomic) 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 161: 
GCAAGGACCA GATCAAGCGG CTGGTGCTCA 30 

(163) INFORMATION FOR SEQ ID NO: 162: 

(i) SEQUENCE CHARACTERISTICS: 
15 (A) LENGTH: 34 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 

20 (xi) SEQUENCE DESCRIPTION: SEQ ID NO: 162: 

CAAGAAAGCC AAAGCCAAGA AACTGATCCT TCTG 34 

(164) INFORMATION FOR SEQ ID NO: 163: 

(i) SEQUENCE CHARACTERISTICS: 
(A) LENGTH: 1068 base pairs 

25 (B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO:163: 

30 ATGGAAGATT TGGAGGAAAC ATTATTTGAA GAATTTGAAA ACTATTCCTA TGACCTAGAC 60 

TATTACTCTC TGGAGTCTGA TTTGGAGGAG AAAGTCCAGC TGGGAGTTGT TCACTGGGTC 12 0 

TCCCTGGTGT TATATTGTTT GGCTTTTGTT CTGGGAATTC CAGGAAATGC CATCGTCATT 18 0 

TGGTTCACGG GGCTCAAGTG GAAGAAGACA GTCACCACTC TGTGGTTCCT CAATCTAGCC 24 0 

ATTGCGGATT TCATTTTTCT TCTCTTTCTG CCCCTGTACA TCTCCTATGT GGCCATGAAT 300 
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TTCCACTGGC CCTTTGGCAT CTGGCTGTGC AAAGCCAATT CCTTCACTGC CCAGTTGAAC 36 0 

ATGTTTGCCA GTGTTTTTTT CCTGACAGTG ATCAGCCTGG ACCACTATAT CCACTTGATC 42 0 

CATCCTGTCT TATCTCATCG GCATCGAACC CTCAAGAACT CTCTGATTGT CATTATATTC 4 80 

ATCTGGCTTT TGGCTTCTCT AATTGGCGGT CCTGCCCTGT ACTTCCGGGA CACTGTGGAG 54 0 

5 TTCAATAATC ATACTCTTTG CTATAACAAT TTTCAGAAGC ATGATCCTGA CCTCACTTTG 6 00 

ATCAGGCACC ATGTTCTGAC TTGGGTGAAA TTTATCATTG GCTATCTCTT CCCTTTGCTA 66 0 

ACAATGAGTA TTTGCTACTT GTGTCTCATC TTCAAGGTGA AGAAGCGAAC AGTCCTGATC 72 0 

TCCAGTAGGC ATAAGTGGAC AATTCTGGTT GTGGTTGTGG CCTTTGTGGT TTGCTGGACT 78 0 

CCTTATCACC TGTTTAG CAT TTGGGAGCTC ACCATTCACC ACAATAGCTA TTCCCACCAT 840 

10 GTGATGCAGG CTGGAATCCC CCTCTCCACT GGTTTGGCAT TCCTCAATAG TTGCTTGAAC 900 

CCCATCCTTT ATGTCCTAAT TAGTAAGAAG TTCCAAGCTC GCTTCCGGTC CTCAGTTGCT 960 

GAGATACTCA AGTACACACT GTGGGAAGTC AGCTGTTCTG G C AC AG TG AG TGAACAGCTC 102 0 

AGGAACTCAG AAACCAAGAA TCTGTGTCTC CTGGAAACAG CTCAATAA 106 8 
(16 5) INFORMATION FOR SEQ ID NO: 164: 

15 (i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH; 3 55 amino acids 

(B) TYPE: amino acid 

(C) STRANDEDNESS : 

(D) TOPOLOGY: not relevant 

20 (ii) MOLECULE TYPE: protein 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO : 164 : 

Met Glu Asp Leu Glu Glu Thr Leu Phe Glu Glu Phe Glu Asn Tyr Ser 
1 5 10 15 

Tyr Asp Leu Asp Tyr Tyr Ser Leu Glu Ser Asp Leu Glu Glu Lys Val 
25 20 25 30 

Gin Leu Gly Val Val His Trp Val Ser Leu Val Leu Tyr Cys Leu Ala 
35 40 45 

Phe Val Leu Gly lie Pro Gly Asn Ala lie Val lie Trp Phe Thr Gly 
50 55 60 

30 Leu Lys Trp Lys Lys Thr Val Thr Thr Leu Trp Phe Leu Asn Leu Ala 

65 70 75 80 

lie Ala Asp Phe lie Phe Leu Leu Phe Leu Pro Leu Tyr lie Ser Tyr 
85 90 95 
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Val Ala Met Asn Phe His Trp Pro Phe Gly lie Trp Leu Cys Lys Ala 
100 105 110 

Asn Sex* Phe Thr Ala Gin Leu Asn Met Phe Ala Ser Val Phe Phe Leu 
115 120 125 

Thr Val lie Ser Leu Asp His Tyr lie His Leu lie His Pro Val Leu 
130 135 140 

Ser His Arg His Arg Thr Leu Lys Asn Ser Leu lie Val lie lie Phe 
145 150 155 160 

lie Trp Leu Leu Ala Ser Leu lie Gly Gly Pro Ala Leu Tyr Phe Arg 
165 1*70 175 

Asp Thr Val Glu Phe Asn Asn His Thr Leu Cys Tyr Asn Asn Phe Gin 
180 185 190 

Lys His Asp Pro Asp Leu Thr Leu lie Arg His His Val Leu Thr Trp 
195 200 205 

Val Lys Phe lie He Gly Tyr Leu Phe Pro Leu Leu Thr Met Ser He 
210 215 220 

Cys Tyr Leu Cys Leu He Phe Lys Val Lys Lys Arg Thr Val Leu He 
225 230 235 240 

Ser Ser Arg His Lys Trp Thr He Leu Val Val Val Val Ala Phe Val 
245 250 255 

Val Cys Trp Thr Pro Tyr His Leu Phe Ser lie Trp Glu Leu Thr He 
260 265 270 

His His Asn Ser Tyr Ser His His Val Met Gin Ala Gly He Pro Leu 
275 280 285 

Ser Thr Gly Leu Ala Phe Leu Asn Ser Cys Leu Asn Pro He Leu Tyr 
290 295 300 

Val Leu He Ser Lys Lys Phe Gin Ala Arg Phe Arg Ser Ser Val Ala 
305 310 315 320 

Glu He Leu Lys Tyr Thr Leu Trp Glu Val Ser Cys Ser Gly Thr Val 
325 330 335 

Ser Glu Gin Leu Arg Asn Ser Glu Thr Lys Asn Leu Cys Leu Leu Glu 
340 345 350 

Thr Ala Gin 
355 

(16 6) INFORMATION FOR SEQ ID NO: 165: 

<i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 1089 base pairs 
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(B) TYPE: nucleic acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 
5 (xi) SEQUENCE DESCRIPTION: SEQ ID NO: 165: 

ATGGGCAACC AC ACG TGGGA GGGCTGCCAC GTGGACTCGC GCGTGGACCA CCTCTTTCCG 6 0 

CCATCCCTCT ACATCTTTGT CATCGGCGTG GGGCTGCCCA CCAACTGCCT GGCTCTGTGG 12 0 

GCGGCCTACC GCCAGGTGCA ACAGCGCAAC GAGCTGGGCG TCTACCTGAT GAACCTCAGC 18 0 

ATCGCCGACC TGCTGTACAT CTGCACGCTG CCGCTGTGGG TGGACTACTT CCTGCACCAC 24 0 

10 GACAACTGGA TCCACGGCCC CGGGTCCTGC AAGCTCTTTG GGTTCATCTT CTACACCAAT 3 00 

ATCTACATCA GCATCGCCTT CCTGTGCTGC ATCTCGGTGG ACCGCTACCT GGCTGTGGCC 360 

CACCCACTCC GCTTCGCCCG CCTGCGCCGC GTCAAGACCG CCGTGGCCGT GAGCTCCGTG 42 0 

GTCTGGGCCA CGGAGCTGGG CGCCAACTCG GCGCCCCTGT TCCATGACGA GCTCTTCCGA 4 80 

GACCGCTACA ACCACACCTT CTGCTTTGAG AAGTTCCCCA TGGAAGGCTG GGTGGCCTGG 54 0 

15 ATGAACCTCT ATCGGGTGTT CGTGGGCTTC CTCTTCCCGT GGG CGCTCAT GCTGCTGTCG 600 

TACCGGGGCA TCCTGCGGGC CGTGCGGGGC AGCGTGTCCA CCGAGCGCCA GGAGAAGGCC 66 0 

AAGATCGCGC GGCTGGCCCT CAGCCTCATC GCCATCGTGC TGGTCTGCTT TGCGCCCTAT 72 0 

CACGTGCTCT TGCTGTCCCG CAGCGCCATC TACCTGGGCC GCCCCTGGGA CTGCGGCTTC 78 0 

GAGGAGCGCG TCTTTTCTGC ATACCACAGC TCACTGGCTT TCACCAGCCT CAACTGTGTG 84 0 

20 GCGGACCCCA TCCTCTACTG CCTGGTCAAC GAGGGCGCCC GCAGCGATGT GGCCAAGGCC 90 0 

CTGCACAACC TGCTCCGCTT TCTGGCCAGC GACAAGCCCC AGGAGATGGC CAATGCCTCG 96 0 

CTCACCCTGG AGACCCCACT CACCTCCAAG AGGAACAGCA CAGCCAAAGC CATGACTGGC 102 0 

AGCTGGGCGG CCACTCCGCC TTCCCAGGGG GACCAGGTGC AGCTGAAGAT GCTGCCGCCA 108 0 

GC ACAATGA 108 9 
25 (16 7) INFORMATION FOR SEQ ID NO: 166: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 3 62 amino acids 

(B) TYPE: amino acid 

(C) STRANDEDNESS: 

30 (D) TOPOLOGY: not relevant 

(ii) MOLECULE TYPE: protein 
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(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 166: 



Met Gly Asn His 
1 

His Leu Phe Pro 
5 20 

Pro Thr Asn Cys 
35 

Arg Asn Glu Leu 
50 

10 Leu Tyr lie Cys 

65 

Asp Asn Trp lie 



Phe Tyr Thr Asn 
15 100 

Val Asp Arg Tyr 
115 

Arg Arg Val Lys 
130 



Thr Trp Glu Gly Cys His 
5 10 

Pro Ser Leu Tyr lie Phe 
25 

Leu Ala Leu Trp Ala Ala 
40 

Gly Val Tyr Leu Met Asn 
55 

Thr Leu Pro Leu Trp Val 
70 

His Gly Pro Gly Ser Cys 
85 90 

lie Tyr lie Ser lie Ala 
105 

Leu Ala Val Ala His Pro 
120 

Thr Ala Val Ala Val Ser 
135 



Val Asp Ser Arg Val Asp 
15 

Val lie Gly Val Gly Leu 
30 

Tyr Arg Gin Val Gin Gin 
45 

Leu Ser lie Ala Asp Leu 
60 

Asp Tyr Phe Leu His His 
75 80 

Lys Leu Phe Gly Phe lie 
95 

Phe Leu Cys Cys lie Ser 
110 

Leu Arg Phe Ala Arg .Leu 
125 

Ser Val Val Trp Ala Thr 
140 



20 Glu Leu Gly Ala Asn Ser Ala Pro Leu Phe His Asp Glu Leu Phe Arg 

145 150 155 160 

Asp Arg Tyr Asn His Thr Phe Cys Phe Glu Lys Phe Pro Met Glu Gly 

165 170 175 



Trp Val Ala Trp Met Asn Leu Tyr Arg Val Phe Val Gly Phe Leu Phe 
25 180 185 190 

Pro Trp Ala Leu Met Leu Leu Ser Tyr Arg Gly lie Leu Arg Ala Val 

195 200 205 



Arg Gly Ser Val Ser 
210 

30 Leu Ala Leu Ser Leu 

225 

His Val Leu Leu Leu 
245 

Asp Cys Gly Phe Glu 
35 260 



Thr Glu Arg Gin Glu Lys 
215 

lie Ala lie Val Leu Val 
230 235 

Ser Arg Ser Ala lie Tyr 
250 

Glu Arg Val Phe Ser Ala 
265 



Ala Lys lie Ala Arg 
220 

Cys Phe Ala Pro Tyr 
240 

Leu Gly Arg Pro Trp 
255 

Tyr His Ser Ser Leu 
270 



Ala Phe Thr Ser Leu Asn Cys Val Ala Asp Pro lie Leu Tyr Cys Leu 
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275 280 285 

Val Asn Glu Gly Ala Arg Ser Asp Val Ala Lys Ala Leu His Asn Leu 
290 295 300 

Leu Arg Phe Leu Ala Ser Asp Lys Pro Gin Glu Met Ala Asn Ala Ser 
5 305 310 315 320 

Leu Thr Leu Glu Thr Pro Leu Thr Ser Lys Arg Asn Ser Thr Ala Lys 
325 330 335 

Ala Met Thr Gly Ser Trp Ala Ala Thr Pro Pro Ser Gin Gly Asp Gin 
340 345 350 

10 Val Gin Leu Lys Met Leu Pro Pro Ala Gin 

355 360 

(16 8) INFORMATION FOR SEQ ID NO: 167: 

(i) SEQUENCE CHARACTERISTICS: 
(A) LENGTH: 1002 base pairs 

15 (B) TYPE: nucleic acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 16 7: 

20 ATGGAGTCCT CAGGCAACCC AG AG AG C AC C ACCTTTTTTT ACTATGACCT T C AG AG C C AG 6 0 

CCGTGTGAGA ACCAGGCCTG GGTCTTTGCT ACCCTCGCCA CCACTGTCCT GTACTGCCTG 12 0 

GTGTTTCTCC TCAG CCTAGT GGGCAACAGC CTGGTCCTGT GGGTCCTGGT GAAGTATGAG 18 0 

AGCCTGGAGT CCCTCACCAA CATCTTCATC CTCAACCTGT GCCTCTCAGA CCTGGTGTTC 24 0 

GCCTGCTTGT TGCCTGTGTG GATCTCCCCA TACCACTGGG GCTGGGTGCT GGGAGACTTC 3 00 

25 CTCTGCAAAC TCCTCAATAT GATCTTCTCC ATCAGCCTCT ACAGCAGCAT CTTCTTCCTG 360 

AC CAT C ATG A CCATCCACCG CTACCTGTCG GTAGTGAGCC CCCTCTCCAC CCTGCGCGTC 42 0 

CCCACCCTCC GCTGCCGGGT GCTGGTGACC ATGG CTGTGT GGGTAGCCAG CATCCTGTCC 4 80 

TCCATCCTCG AC AC CAT C TT CCACAAGGTG CTTTCTTCGG GCTGTGATTA TTCCGAACTC 54 0 

ACGTGGTACC TCACCTCCGT CTACCAGCAC AACCTCTTCT TCCTGCTGTC CCTGGGGATT 6 00 

30 ATCCTGTTCT G CTACGTGG A GATCCTCAGG ACCCTGTTCC GCTCACGCTC CAAGCGGCGC 66 0 

CACCGCACGA AAAAGCTCAT CTTCGCCATC GTGGTGGCCT ACTTCCTCAG CTGGGGTCCC 72 0 

TACAACTTCA CCCTGTTTCT GCAGACGCTG TTTCGGACCC AGATCATCCG GAGCTGCGAG 78 0 
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GCCAAACAGC AGCTAGAATA CGCCCTGCTC ATCTGCCGCA ACCTCGCCTT CTCCCACTGC 84 0 

TGCTTTAACC CGGTGCTCTA TGTCTTCGTG GGGGTCAAGT TCCGCACACA CCTGAAACAT 90 0 

GTTCTCCGGC AGTTCTGGTT CTGCCGGCTG CAGGCACCCA GCCCAGCCTC GATCCCCCAC 96 0 

TCCCCTGGTG CCTTCGCCTA TGAGGGCGCC TCCTTCTACT GA 1002 

5 (169) INFORMATION FOR SEQ ID NO:168: 

( i ) S EQUENCE CHARACTER I S T I CS : 

(A) LENGTH: 333 amino acids 

(B) TYPE: amino acid 

(C) STRANDEDNESS : 

10 (D) TOPOLOGY: not relevant 

(ii) MOLECULE TYPE: protein 

{xi) SEQUENCE DESCRIPTION: SEQ ID NO: 168: 

Met Glu Ser Ser Gly Asn Pro Glu Ser Thr Thr Phe Phe Tyr Tyr Asp 
15 10 15 

15 Leu Gin Ser Gin Pro Cys Glu Asn Gin Ala Trp Val Phe Ala Thr Leu 

20 25 30 

Ala Thr Thr Val Leu Tyr Cys Leu Val Phe Leu Leu Ser Leu Val Gly 
35 40 45 

Asn Ser Leu Val Leu Trp Val Leu Val Lys Tyr Glu Ser Leu Glu Ser 
20 50 55 60 

Leu Thr Asn lie Phe lie Leu Asn Leu Cys Leu Ser Asp Leu Val Phe 
65 70 75 80 

Ala Cys Leu Leu Pro Val Trp lie Ser Pro Tyr His Trp Gly Trp Val 
85 90 95 

25 Leu Gly Asp Phe Leu Cys Lys Leu Leu Asn Met lie Phe Ser lie Ser 

100 105 110 

Leu Tyr Ser Ser lie Phe Phe Leu Thr lie Met Thr lie His Arg Tyr 
115 120 125 

Leu Ser Val Val Ser Pro Leu Ser Thr Leu Arg Val Pro Thr Leu Arg 
30 130 135 140 

Cys Arg Val Leu Val Thr Met Ala Val Trp Val Ala Ser lie Leu Ser 
145 150 155 160 

Ser lie Leu Asp Thr lie Phe His Lys Val Leu Ser Ser Gly Cys Asp 
165 170 175 

35 Tyr Ser Glu Leu Thr Trp Tyr Leu Thr Ser Val Tyr Gin His Asn Leu 

180 185 190 
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Phe Phe Leu Leu Ser Leu Gly lie lie Leu Phe Cys Tyr Val Glu lie 
195 200 205 

Leu Arg Thr Leu Phe Arg Ser Arg Ser Lys Arg Arg His Arg Thr Lys 
210 215 220 

5 Lys Leu lie Phe Ala lie Val Val Ala Tyr Phe Leu Ser Trp Gly Pro 

225 230 235 240 

Tyr Asn Phe Thr Leu Phe Leu Gin Thr Leu Phe Arg Thr Gin lie lie 
245 250 255 

Arg Ser Cys Glu Ala Lys Gin Gin Leu Glu Tyr Ala Leu Leu lie Cys 
10 260 265 270 

Arg Asn Leu Ala Phe Ser His Cys Cys Phe Asn Pro Val Leu Tyr Val 
275 280 285 

Phe Val Gly Val Lys Phe Arg Thr His Leu Lys His Val Leu Arg Gin 
290 295 300 

15 Phe Trp Phe Cys Arg Leu Gin Ala Pro Ser Pro Ala Ser lie Pro His 

305 310 315 320 

Ser Pro Gly Ala Phe Ala Tyr Glu Gly Ala Ser Phe Tyr 
325 330 

(170) INFORMATION FOR SEQ ID NO: 16 9: 

20 (i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 987 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 

25 (ii) MOLECULE TYPE: DNA (genomic) 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 16 9: 

ATGGACAACG CCTCGTTCTC GGAGCCCTGG CCCGCCAACG CATCGGGCCC GGACCCGGCG 6 0 

CTGAGCTGCT CCAACGCGTC GACTCTGGCG CCGCTGCCGG CGCCGCTGGC GGTGGCTGTA 12 0 

CCAGTTGTCT ACGCGGTGAT CTGCGCCGTG GGTCTGGCGG GCAACTCCGC CGTGCTGTAC 18 0 

30 GTGTTGCTGC GGGCGCCCCG CATGAAGACC GTCACCAACC TGTTCATCCT CAACCTGGCC 24 0 

ATCGCCGACG AGCTCTTCAC GCTGGTGCTG CCCATCAACA TCGCCGACTT CCTGCTGCGG 3 00 

CAGTGGCCCT TCGGGGAGCT CATGTGCAAG CTCATCGTGG CTATCGACCA GTACAACACC 36 0 

TTCTCCAGCC TCTACTTCCT CACCGTCATG AGCGCCGACC GCTACCTGGT GGTGTTGGCC 420 

ACTGCGGAGT CGCGCCGGGT GGCCGGCCGC ACCTACAGCG CCGCGCGCGC GGTGAGCCTG 480 
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GCCGTGTGGG GGATCGTCAC ACTCGTCGTG CTGCCCTTCG CAGTCTTCGC CCGGCTAGAC 54 0 

GACGAGCAGG GCCGGCGCCA GTGCGTGCTA GTCTTTCCGC AGCCCGAGGC CTTCTGGTGG 6 00 

CGCGCGAGCC GCCTCTACAC GCTCGTGCTG GGCTTCGCCA TCCCCGTGTC CACCATCTGT 66 0 

GTCCTCTATA CCACCCTGCT GTGCCGGCTG CATGCCATGC GGCTGGACAG CCACGCCAAG 72 0 

5 GCCCTGGAGC GCGCCAAGAA GCGGGTGAAG TTCCTGGTGG TGGCAATCCT GGCGGTGTGC 78 0 

CTCCTCTGCT GGACGCCCTA CCACCTGAGC ACCGTGGTGG CGCTCACCAC CGACCTCCCG 84 0 

CAGACGCCGC TGGTCATCGC TATCTCCTAC TTCATCACCA GCCTGACGTA CGCCAACAGC 900 

TGCCTCAACC CCTTCCTCTA CGCCTTCCTG GACGCCAGCT TCCGCAGGAA CCTCCGCCAG 96 0 

CTGATAACTT GCCGCGCGGC AGCCTGA 987 
10 (171) INFORMATION FOR SEQ ID NO: 170: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 328 amino acids 

(B) TYPE: amino acid 

(C) STRANDEDNESS : 

15 (D) TOPOLOGY: not relevant 

<ii) MOLECULE TYPE: protein 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 170: 

Met Asp Asn Ala Ser Phe Ser Glu Pro Trp Pro Ala Asn Ala Ser Gly 
15 10 15 

20 Pro Asp Pro Ala Leu Ser Cys Ser Asn Ala Ser Thr Leu Ala Pro Leu 

20 25 30 

Pro Ala Pro Leu Ala Val Ala Val Pro Val Val Tyr Ala Val lie Cys 
35 40 45 

Ala Val Gly Leu Ala Gly Asn Ser Ala Val Leu Tyr Val Leu Leu Arg 
25 50 55 60 

Ala Pro Arg Met Lys Thr Val Thr Asn Leu Phe lie Leu Asn Leu Ala 
65 70 75 80 

lie Ala Asp Glu Leu Phe Thr Leu Val Leu Pro lie Asn lie Ala Asp 
85 90 95 

30 Phe Leu Leu Arg Gin Trp Pro Phe Gly Glu Leu Met Cys Lys Leu lie 

100 105 110 

Val Ala lie Asp Gin Tyr Asn Thr Phe Ser Ser Leu Tyr Phe Leu Thr 
115 120 125 

Val Met Ser Ala Asp Arg Tyr Leu Val Val Leu Ala Thr Ala Glu Ser 
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130 135 140 

Arg Arg Val Ala Gly Arg Thr Tyr Ser Ala Ala Arg Ala Val Ser Leu 
145 150 155 160 

Ala Val Trp Gly lie Val Thr Leu Val Val Leu Pro Phe Ala Val Phe 
5 165 170 175 

Ala Arg Leu Asp Asp Glu Gin Gly Arg Arg Gin Cys Val Leu Val Phe 
180 185 190 

Pro Gin Pro Glu Ala Phe Trp Trp Arg Ala Ser Arg Leu Tyr Thr Leu 
195 200 205 

10 Val Leu Gly Phe Ala lie Pro Val Ser Thr lie Cys Val Leu Tyr Thr 

210 215 220 

Thr Leu Leu Cys Arg Leu His Ala Met Arg Leu Asp Ser His Ala Lys 
225 230 235 240 

Ala Leu Glu Arg Ala Lys Lys Arg Val Lys Phe Leu Val Val Ala lie 
15 245 250 255 

Leu Ala Val Cys Leu Leu Cys Trp Thr Pro Tyr His Leu Ser Thr Val 
260 265 270 

Val Ala Leu Thr Thr Asp Leu Pro Gin Thr Pro Leu Val lie Ala lie 
275 280 285 

20 Ser Tyr Phe He Thr Ser Leu Thr Tyr Ala Asn Ser Cys Leu Asn Pro 

290 295 300 

Phe Leu Tyr Ala Phe Leu Asp Ala Ser Phe Arg Arg Asn Leu Arg Gin 
305 310 315 320 

Leu He Thr Cys Arg Ala Ala Ala 
25 325 

(172) INFORMATION FOR SEQ ID NO; 171: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 1002 base pairs 

(B) TYPE: nucleic acid 
30 (C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 171: 

ATGCAGGCCG CTGGGCACCC AGAGCCCCTT GACAGCAGGG GCTCCTTCTC CCTCCCCACG 6 0 

35 ATGGGTGCCA ACGTCTCTCA GGACAATGGC ACTGGCCACA ATGCCACCTT CTCCGAGCCA 12 0 

CTGCCGTTCC TCTATGTGCT CCTGCCCGCC GTGTACTCCG GGATCTGTGC TGTGGGGCTG 18 0 
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ACTGGCAACA CGGCCGTCAT CCTTGTAATC CTAAGGGCGC 

AACGTGTTCA TCCTGAACCT GGCCGTCGCC GACGGGCTCT 

AACATCGCGG AGCACCTGCT GCAGTACTGG CCCTTCGGGG 

CTGGCCGTCG ACCACTACAA CATCTTCTCC AGCATCTACT 
5 GACCGATACC TGGTGGTGCT GGCCACCGTG AGGTCCCGCC 

CGGGGGGCGA AGGTCGCCAG CCTGTGTGTC TGGCTGGGCG 

TTCTTCTCTT TCGCTGGCGT C T AC AG C AAC GAGCTGCAGG 

TTCCCGTGGC CCGAGCAGGT CTGGTTCAAG GCCAGCCGTG 

TTCGTGCTGC CCGTGTGCAC CATCTGTGTG CTCTACACAG 
10 GCCGTGCGGC TCCGCTCTGG AGCCAAGGCT CTAGGCAAGG 

CTGGTCCTCG TCGTGCTGGC CGTGTGCCTC CTCTGCTGGA 

GTCGTGGCCC TGACCACGGA CCTGCCCCAG ACCCCACTGG 

ATCACCAGCC TCACGTACGC CAACTCGTGC CTGAACCCCT 

GACAACTTCC GGAAGAACTT C CG C AG CAT A TTGCGGTGCT 
15 (173) INFORMATION FOR SEQ ID NO:172: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 333 amino acids 

(B) TYPE: amino acid 
{ C ) S TRANDEDNES S : 

20 (D) TOPOLOGY: not relevant 

(ii) MOLECULE TYPE: protein 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 172: 

Met Gin Ala Ala Gly His Pro Glu Pro Leu Asp Ser Arg Gly Ser Phe 
15 10 15 

25 Ser Leu Pro Thr Met Gly Ala Asn.Val Ser Gin Asp Asn Gly Thr Gly 

20 25 30 

His Asn Ala Thr Phe Ser Glu Pro Leu Pro Phe Leu Tyr Val Leu Leu 
35 40 45 

Pro Ala Val Tyr Ser Gly lie Cys Ala Val Gly Leu Thr Gly Asn Thr 
30 50 55 60 

Ala Val lie Leu Val lie Leu Arg Ala Pro Lys Met Lys Thr Val Thr 
65 70 75 80 



CCAAGATGAA GACGGTGACC 24 0 

TCACGCTGGT ACTGCCTGTC 3 00 

AGCTGCTCTG CAAGCTGGTG 30 0 

TCCTAGCCGT GATGAGCGTG 42 0 

ACATGCCCTG GCGCACCTAC 4 80 

TCACGGTCCT GGTTCTGCCC 54 0 

TCCCAAGCTG TGGGCTGAGC 6 00 

TCTACACGTT GGTCCTGGGC 66 0 

ACCTCCTGCG CAGGCTGCGG 72 0 

CCAGGCGGAA GGTGAAAGTC 78 0 

CGCCCTTCCA CCTGGCCTCT 84 0 

TCATCAGTAT GTCCTACGTC 900 

TCCTCTACGC CTTTCTAGAT 96 0 

GA 1002 
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Asn Val Phe lie 



Val Leu Pro Val 
100 

Gly Glu Leu Leu 
115 

Phe Ser Ser lie 
130 

Val Val Leu Ala 
145 

Arg Gly Ala Lys 



Leu Val Leu Pro 
180 

Gin Val Pro Ser 
195 

Phe Lys Ala Ser 
210 

Val Cys Thr lie 
225 

Ala Val Arg Leu 



Lys Val Lys Val 

260 

Trp Thr Pro Phe 
275 

Pro Gin Thr Pro 
290 

Thr Tyr Ala Asn 
305 

Asp Asn Phe Arg 



Leu Asn Leu Ala 
85 

Asn lie Ala Glu 



Cys Lys Leu Val 
120 

Tyr Phe Leu Ala 
135 

Thr Val Arg Ser 
150 

Val Ala Ser Leu 
165 

Phe Phe Ser Phe 



Cys Gly Leu Ser 
200 

Arg Val Tyr Thr 
215 

Cys Val Leu Tyr 
230 

Arg Ser Gly Ala 
245 

Leu Val Leu Val 



His Leu Ala Ser 
280 

Leu Val lie Ser 
295 

Ser Cys Leu Asn 
310 

Lys Asn Phe Arg 
325 



Val Ala Asp Gly 
90 

His Leu Leu Gin 
105 

Leu Ala Val Asp 

Val Met Ser Val 
140 

Arg His Met Pro 
155 

Cys Val Trp Leu 
170 

Ala Gly Val Tyr 
185 

Phe Pro Trp Pro 



Leu Val Leu Gly 
220 

Thr Asp Leu Leu 
235 

Lys Ala Leu Gly 
250 

Val Leu Ala Val 
265 

Val Val Ala Leu 



Met Ser Tyr Val 
300 

Pro Phe Leu Tyr 
315 

Ser lie Leu Arg 
330 



Leu Phe Thr Leu 
95 

Tyr Trp Pro Phe 
110 

His Tyr Asn lie 
125 

Asp Arg Tyr Leu 



Trp Arg Thr Tyr 
160 

Gly Val Thr Val 
175 

Ser Asn Glu Leu 
190 

Glu Gin Val Trp 
205 

Phe Val Leu Pro 



Arg Arg Leu Arg 
240 

Lys Ala Arg Arg 
255 

Cys Leu Leu Cys 
270 

Thr Thr Asp Leu 
285 

lie Thr Ser Leu 



Ala Phe Leu Asp 
320 

Cys 



(174) INFORMATION FOR SEQ ID NO: 173: 

(i) SEQUENCE CHARACTERISTICS : 

(A) LENGTH: 1107 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 
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<ii) MOLECULE TYPE: DNA (genomic) 
(xi) SEQUENCE DESCRIPTION: SEQ ID NO:173: 

ATGGTCCTTG AGGTGAGTGA CCACCAAGTG CTAAATGACG CCGAGGTTGC CGCCCTCCTG 6 0 

GAGAACTTCA GCTCTTCCTA TGACTATGGA GAAAACGAGA GTGACTCGTG CTGTACCTCC 12 0 

CCGCCCTGCC CACAGGACTT CAGCCTGAAC TTCGACCGGG CCTTCCTGCC AGCCCTCTAC 180 

AGCCTCCTCT TTCTGCTGGG GCTGCTGGGC AACGGCGCGG TGGCAGCCGT GCTGCTGAGC 24 0 

CGGCGGACAG CCCTGAGCAG CACCGACACC TTCCTGCTCC ACCTAGCTGT AG C AG AC ACG 3 00 

CTGCTGGTGC TGACACTGCC GCTCTGGGCA GTGGACGCTG CCGTCCAGTG GGTCTTTGGC 3 60 

TCTGGCCTCT GCAAAGTGGC AGGTGCCCTC TTCAACATCA ACTTCTACGC AGGAGCCCTC 420 

CTGCTGGCCT GCATCAGCTT TGACCGCTAC CTGAACATAG TTCATGCCAC CCAGCTCTAC 4 80 

CGCCGGGGGC CCCCGGCCCG CGTGACCCTC ACCTGCCTGG CTGTCTGGGG GCTCTGCCTG 54 0 

CTTTTCGCCC TCCCAGACTT CATCTTCCTG TCGGCCCACC ACGACGAGCG CCTCAACGCC 6 00 

ACCCACTGCC AATACAACTT CCCACAGGTG GGCCGCACGG CTCTGCGGGT GCTGCAGCTG 66 0 

GTGGCTGGCT TTCTGCTGCC CCTGCTGGTC ATGGCCTACT GCTATGCCCA CATCCTGGCC 72 0 

GTGCTGCTGG TTTCCAGGGG CCAGCGGCGC CTGCGGGCCA AGCGGCTGGT GGTGGTGGTC 78 0 

GTGGTGGCCT TTGCCCTCTG CTGGACCCCC TATCACCTGG TGGTGCTGGT GGACATCCTC 84 0 

ATGGACCTGG GCGCTTTGGC CCGCAACTGT GGCCGAGAAA GCAGGGTAGA CGTGGCCAAG 9 00 

TCGGTC AC CT CAGGCCTGGG CTACATGCAC TGCTGCCTCA ACCCGCTGCT CTATGCCTTT 96 0 

GTAGGGGTCA AGTTCCGGGA GCGGATGTGG ATGCTGCTCT TGCGCCTGGG CTGCCCCAAC 102 0 

CAGAGAGGGC TCCAGAGGCA GCCATCGTCT TCCCGCCGGG ATTCATCCTG GTCTGAGACC 108 0 

TCAGAGGCCT CCTACTCGGG CTTGTGA 110 7 
(17 5) INFORMATION FOR SEQ ID NO: 174: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 368 amino acids 

(B) TYPE: amino acid 

( C ) STRANDEDNESS : 

(D) TOPOLOGY: not relevant 

(ii) MOLECULE TYPE: protein 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 174: 
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Met 
1 

Ala 

5 Glu 

Leu 

Leu 
10 65 

Arg 
Val 

15 Ala 
Ala 
He 

20 145 

Arg 

Gly 

25 His 
Gin 
Leu 

30 225 
Val 

Val 

35 Leu 

Asn 
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Val Leu Glu Val Ser 
5 

Ala Leu Leu Glu Asn 
20 

Ser Asp Ser Cys Cys 
35 

Asn Phe Asp Arg Ala 
50 

Leu Gly Leu Leu Gly 
70 

Arg Thr Ala Leu Ser 
85 

Ala Asp Thr Leu Leu 
100 

Ala Val Gin Trp Val 
115 

Leu Phe Asn He Asn 
130 

Ser Phe Asp Arg Tyr 
150 

Arg Gly Pro Pro Ala 
165 

Leu Cys Leu Leu Phe 
180 

His Asp Glu Arg Leu 
195 

Val Gly Arg Thr Ala 
210 

Leu Pro Leu Leu Val 
230 

Leu Leu Val Ser Arg 
245 

Val Val Val Val Val 
260 

Val Val Leu Val Asp 

275 

Cys Gly Arg Glu Ser 



124 

Asp His Gin Val Leu 
10 

Phe Ser Ser Ser Tyr 
25 

Thr Ser Pro Pro Cys 
40 

Phe Leu Pro Ala Leu 
55 

Asn Gly Ala Val Ala 
75 

Ser Thr Asp Thr Phe 
90 

Val Leu Thr Leu Pro 
105 

Phe Gly Ser Gly Leu 
120 

Phe Tyr Ala Gly Ala 
135 

Leu Asn He Val His 
155 

Arg Val Thr Leu Thr 
170 

Ala Leu Pro Asp Phe 
185 

Asn Ala Thr His Cys 
200 

Leu Arg Val Leu Gin 
215 

Met Ala Tyr Cys Tyr 
235 

Gly Gin Arg Arg Leu 
250 

Ala Phe Ala Leu Cys 
265 

lie Leu Met Asp Leu 
280 

Arg Val Asp Val Ala 



Asn Asp Ala Glu Val 
15 

Asp Tyr Gly Glu Asn 
30 

Pro Gin Asp Phe Ser 
45 

Tyr Ser Leu Leu Phe 
60 

Ala Val Leu Leu Ser 
80 

Leu Leu His Leu Ala 
95 

Leu Trp Ala Val Asp 
110 

Cys Lys Val Ala Gly 
125 

Leu Leu Leu Ala Cys 
140 

Ala Thr Gin Leu Tyr 
160 

Cys Leu Ala Val Trp 
175 

lie Phe Leu Ser Ala 
190 

Gin Tyr Asn Phe Pro 
205 

Leu Val Ala Gly Phe 
220 

Ala His He Leu Ala 
240 

Arg Ala Lys Arg Leu 
255 

Trp Thr Pro Tyr His 
270 

Gly Ala Leu Ala Arg 
285 

Lys Ser Val Thr Ser 
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290 295 300 

Gly Leu Gly Tyr Met His Cys Cys Leu Asn Pro Leu Leu Tyr Ala Phe 
305 310 315 320 

Val Gly Val Lys Phe Arg Glu Arg Met Trp Met Leu Leu Leu Arg Leu 
5 325 330 335 

Gly Cys Pro Asn Gin Arg Gly Leu Gin Arg Gin Pro Ser Ser Ser Arg 
340 345 350 

Arg Asp Ser Ser Trp Ser Glu Thr Ser Glu Ala Ser Tyr Ser Gly Leu 
355 360 365 

10 (176) INFORMATION FOR SEQ ID NO: 175: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 1074 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : single 
15 (D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 175: 

ATGGCTGATG ACTATGGCTC TGAATCCACA TCTTCCATGG AAGACTACGT TAACTTCAAC 6 0 

TTCACTGACT TCTACTGTGA GAAAAACAAT GTCAGGCAGT TTGCGAGCCA TTTCCTCCCA 12 0 

20 CCCTTGTACT GGCTCGTGTT CATCGTGGGT GCCTTGGGCA ACAGTCTTGT TATCCTTGTC 180 

TACTGGTACT GCACAAGAGT GAAGACCATG ACCGACATGT TCCTTTTGAA TTTGGCAATT 24 0 

GCTGACCTCC TCTTTCTTGT CACTCTTCCC TTCTGGGCCA TTGCTGCTGC TGACCAGTGG 3 00 

AAGTTCCAGA CCTTCATGTG CAAGGTGGTC AACAGCATGT ACAAGATGAA CTTCTACAGC 3 60 

TGTGTGTTGC TGATCATGTG CATCAGCGTG GACAGGTACA TTGCCATTGC CCAGGCCATG 42 0 

25 AGAGCACATA CTTGGAGGGA GAAAAGGCTT TTGTAC AG C A AAATGGTTTG CTTTACCATC 48 0 

TGGGTATTGG CAGCTGCTCT CTGCATCCCA GAAATCTTAT ACAGCCAAAT CAAGGAGGAA 54 0 

TCCGGCATTG CTATCTGCAC CATGGTTTAC CCTAG CGATG AG AG C AC C AA ACTGAAGTCA 6 00 

GCTGTCTTGA CCCTGAAGGT CATTCTGGGG TTCTTCCTTC CCTTCGTGGT CATGGCTTGC 66 0 

TGCTATACCA TCATCATTCA CACCCTGATA CAAGCCAAGA AGTCTTCCAA GCACAAAGCC 72 0 

30 AAGAAAGTGA CCATCACTGT CCTGACCGTC TTTGTCTTGT CTCAGTTTCC CTACAACTGC 780 

ATTTTGTTGG TGCAGACCAT TGACGCCTAT G CCATGTTC A TCTCCAACTG TGCCGTTTCC 84 0 
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ACCAACATTG ACATCTGCTT CCAGGTCACC CAGACCATCG CCTTCTTCCA CAGTTGCCTG 9 00 

AACCCTGTTC TCTATGTTTT TGTGGGTGAG AGATTCCGCC GGGATCTCGT GAAAACCCTG 96 0 

AAGAACTTGG GTTGCATCAG CCAGGCCCAG TGGGTTTCAT TTACAAGGAG AGAGGGAAGC 102 0 

TTGAAGCTGT CGTCTATGTT GCTGGAGACA ACCTCAGGAG CACTCTCCCT CTGA 10 74 
5 (177) INFORMATION FOR SEQ ID NO: 176: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 3 57 amino acids 

(B) TYPE: amino acid 

(C) STRANDEDNESS : 

10 <D) TOPOLOGY: not relevant 

(ii) MOLECULE TYPE: protein 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 176: 

Met Ala Asp Asp Tyr Gly Ser Glu Ser Thr Ser Ser Met Glu Asp Tyr 
15 10 15 

15 Val Asn Phe Asn Phe Thr Asp Phe Tyr. Cys Glu Lys Asn Asn Val Arg 

20 25 30 

Gin Phe Ala Ser His Phe Leu Pro Pro Leu Tyr Trp Leu Val Phe lie 
35 40 45 

Val Gly Ala Leu Gly Asn Ser Leu Val lie Leu Val Tyr Trp Tyr Cys 
20 50 55 60 

Thr Arg Val Lys Thr Met Thr Asp Met Phe Leu Leu Asn Leu Ala lie 
65 70 75 80 

Ala Asp Leu Leu Phe Leu Val Thr Leu Pro Phe Trp Ala lie Ala Ala 
85 90 95 

25 Ala Asp Gin Trp Lys Phe Gin Thr Phe Met Cys Lys Val Val Asn Ser 

100 105 110 

Met Tyr Lys Met Asn Phe Tyr Ser Cys Val Leu Leu lie Met Cys lie 
115 120 125 

Ser Val Asp Arg Tyr lie Ala lie Ala Gin Ala Met Arg Ala His Thr 
30 130 135 140 

Trp Arg Glu Lys Arg Leu Leu Tyr Ser Lys Met Val Cys Phe Thr lie 
145 150 155 160 

Trp Val Leu Ala Ala Ala Leu Cys lie Pro Glu lie Leu Tyr Ser Gin 
165 170 175 

35 lie Lys Glu Glu Ser Gly lie Ala lie Cys Thr Met Val Tyr Pro Ser 

180 185 190 
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Asp Glu Ser Thr Lys Leu Lys Ser Ala Val Leu Thr- Leu Lys Val lie 
195 200 205 

Leu Gly Phe Phe Leu Pro Phe Val Val Met Ala Cys Cys Tyr Thr lie 
210 215 220 

5 lie lie His Thr Leu lie Gin Ala Lys Lys Ser Ser Lys His Lys Ala 

225 230 235 240 

Lys Lys Val Thr lie Thr Val Leu Thr Val Phe Val Leu Ser Gin Phe 
245 250 255 

Pro Tyr Asn Cys lie Leu Leu Val Gin Thr lie Asp Ala Tyr Ala Met 
10 260 265 270 

Phe lie Ser Asn Cys Ala Val Ser Thr Asn lie Asp lie Cys Phe Gin 
275 280 285 

Val Thr Gin Thr lie Ala Phe Phe His Ser Cys Leu Asn Pro Val Leu 
290 295 300 

15 Tyr Val Phe Val Gly Glu Arg Phe Arg Arg Asp Leu Val Lys Thr Leu 

305 310 315 320 

Lys Asn Leu Gly Cys lie Ser Gin Ala Gin Trp Val Ser Phe Thr Arg 
325 330 335 

Arg Glu Gly Ser Leu Lys Leu Ser Ser Met Leu Leu Glu Thr Thr Ser 
20 340 345 350 

Gly Ala Leu Ser Leu 
355 

(17 8) INFORMATION FOR SEQ ID NO: 177: 

(i) SEQUENCE CHARACTERISTICS : 
25 (A) LENGTH: 1110 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 

30 (xi) SEQUENCE DESCRIPTION: SEQ ID NO: 177: 

ATGGCCTCAT CGACCACTCG GGGCC CCAGG GTTTCTGACT TATTTTCTGG GCTGCCGCCG 6 0 

GCGGTCACAA CTCCCGCCAA CCAGAGCGCA GAGGCCTCGG CGGGCAACGG GTCGGTGGCT 12 0 

GGCGCGGACG CTCCAGCCGT CACGCCCTTC CAGAGCCTGC AGCTGGTGCA TCAGCTGAAG 18 0 

GGGCTGATCG TGCTGCTCTA CAGCGTCGTG GTGGTCGTGG GGCTGGTGGG CAACTGCCTG 24 0 

35 CTGGTGCTGG TGATCGCGCG GGTGCCGCGG CTGCACAACG TGACGAACTT CCTCATCGGC 3 00 
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AACCTGGCCT TGTCCGACGT GCTCATGTGC ACCGCCTGCG TGCCGCTCAC GCTGGCCTAT 36 0 

GCCTTCGAGC CACGCGGCTG GGTGTTCGGC GGCGGCCTGT GCCACCTGGT CTTCTTCCTG 42 0 

CAGCCGGTCA CCGTCTATGT GTCGGTGTTC ACGCTCACCA CCATCGCAGT GGACCGCTAC 480 

GTCGTGCTGG TGCACCCGCT GAGGCGCGCA TCTCGCTGCG CCTCAGCCTA CGCTGTGCTG 54 0 

GCCATCTGGG CGCTGTCCGC GGTGCTGGCG CTGCCGCCCG CCGTGCACAC CTATCACGTG 600 

GAGCTCAAGC CGCACGACGT GCGCCTCTGC GAGGAGTTCT GGGGCTCCCA GGAGCGCCAG 66 0 

CGCCAGCTCT ACGCCTGGGG GCTGCTGCTG GTCACCTACC TGCTCCCTCT GCTGGTCATC 72 0 

CTCCTGTCTT ACGTCCGGGT GTCAGTGAAG CTCCGCAACC GCGTGGTGCC GGGCTGCGTG 78 0 

ACCCAGAGCC AGGCCGACTG GGACCGCGCT CGGCGCCGGC GC AC CAAATG CTTGCTGGTG 84 0 

GTGGTCGTGG TGGTGTTCGC CGTCTGCTGG CTGCCGCTGC ACGTCTTCAA CCTGCTGCGG 90 0 

GACCTCGACC CCCACGCCAT CGACCCTTAC GCCTTTGGGC TGGTGCAGCT GCTCTGCCAC 96 0 

TGGCTCGCCA TGAGTTCGGC CTGCTACAAC CCCTTCATCT ACGCCTGGCT GCACGACAGC 102 0 

TTCCGCGAGG AGCTGCGCAA ACTGTTGGTC GCTTGGCCCC GCAAGATAGC CCCCCATGGC 108 0 

CAGAATATGA CCGTCAGCGT GGTCATCTGA 1110 
(179) INFORMATION FOR SEQ ID NO: 178: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 36 9 amino acids 

(B) TYPE: amino acid 

(C) STRANDEDNESS : 

(D) TOPOLOGY: not relevant 

(ii) MOLECULE TYPE: protein 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 178: 

Met Ala Ser Ser Thr Thr Arg Gly Pro Arg Val Ser Asp Leu Phe Ser 
15 10 15 

Gly Leu Pro Pro Ala Val Thr Thr Pro Ala Asn Gin Ser Ala Glu Ala 



20 



25 



30 



Ser Ala Gly Asn Gly Ser Val Ala 
35 40 



Gly Ala Asp Ala Pro Ala Val Thr 
45 



Pro Phe Gin Ser Leu Gin Leu Val 
50 55 



His Gin Leu Lys Gly Leu lie Val 
60 



Leu Leu Tyr Ser Val Val Val Val 
65 70 



Val Gly Leu Val Gly Asn Cys Leu 
75 80 
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Leu Val Leu Val lie 
85 

Phe Leu lie Gly Asn 
100 

Cys Val Pro Leu Thr 
115 

Phe Gly Gly Gly Leu 
130 

Val Tyr Val Ser Val 
145 

Val Val Leu Val His 
165 

Tyr Ala Val Leu Ala 
180 

Pro Ala Val His Thr 
195 

Leu Cys Glu Glu Phe 
210 

Ala Trp Gly Leu Leu 
225 

Leu Leu Ser Tyr Val 
245 

Pro Gly Cys Val * Thr 
260 

Arg Arg Thr Lys Cys 
275 

Cys Trp Leu Pro Leu 
290 

His Ala lie Asp Pro 
305 

Trp Leu Ala Met Ser 
325 

Leu His Asp Ser Phe 
340 
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Ala Arg Val Pro Arg Leu 
90 

Leu Ala Leu Ser Asp Val 
105 

Leu Ala Tyr Ala Phe Glu 
120 

Cys His Leu Val Phe Phe 
135 

Phe Thr Leu Thr Thr lie 
150 155 

Pro Leu Arg Arg Ala Ser 
170 

lie Trp Ala Leu Ser Ala 
185 

Tyr His Val Glu Leu Lys 
200 

Trp Gly Ser Gin Glu Arg 
215 

Leu Val Thr Tyr Leu Leu 
230 235 

Arg Val Ser Val Lys Leu 
250 

Gin Ser Gin Ala Asp Trp 
265 

Leu Leu Val Val Val Val 
280 

His Val Phe Asn Leu Leu 
295 

Tyr Ala Phe Gly Leu Val 
310 315 

Ser Ala Cys Tyr Asn Pro 
330 

Arg Glu Glu Leu Arg Lys 
345 



His Asn Val Thr Asn 
95 

Leu Met Cys Thr Ala 
110 

Pro Arg Gly Trp Val 
125 

Leu Gin Pro Val Thr 
140 

Ala Val Asp Arg Tyr 
160 

Arg Cys Ala Ser Ala 
175 

Val Leu Ala Leu Pro 
190 

Pro His Asp Val Arg 
205 

Gin Arg Gin Leu Tyr 

220 

Pro Leu Leu Val lie 
240 

Arg Asn Arg Val Val 
255 

Asp Arg Ala Arg Arg 
270 

Val Val Phe Ala Val 
285 

Arg Asp Leu Asp Pro 
300 

Gin Leu Leu Cys His 
320 

Phe lie Tyr Ala Trp 
335 

Leu Leu Val Ala Trp 
350 



Pro Arg Lys lie Ala Pro His Gly Gin Asn Met Thr Val Ser Val Val 
355 360 365 
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(180) INFORMATION FOR SEQ ID NO: 179: 

<i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 1083 base pairs 
5 (B) TYPE: nucleic acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 
(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 179: 

10 ATGGACCCAG AAGAAACTTC AGTTTATTTG GATTATTACT ATGCTACGAG CCCAAACTCT 6 0 

GACATCAGGG AGACCCACTC CCATGTTCCT TACACCTCTG TCTTCCTTCC AGTCTTTTAC 12 0 

ACAGCTGTGT TCCTGACTGG AGTGCTGGGG AACCTTGTTC TCATGGGAGC GTTGCATTTC 18 0 

AAACCCGGCA GCCGAAGACT GATCGACATC TTTATCATCA ATCTGGCTGC CTCTGACTTC 24 0 

ATTTTTCTTG TCACATTGCC TCTCTGGGTG GATAAAGAAG CATCTCTAGG ACTGTGGAGG 3 00 

15 ACGGGCTCCT TCCTGTGCAA AGGGAGCTCC TACATGATCT CCGTCAATAT GCACTGCAGT 360 

GTCCTCCTGC TCACTTGCAT GAGTGTTGAC CGCTACCTGG CCATTGTGTG GCCAGTCGTA 42 0 

TCCAGGAAAT TCAGAAGGAC AGACTGTGCA TATGTAGTCT GTGCCAGCAT CTGGTTTATC 48 0 

TCCTGCCTGC TGGGGTTGCC TACTCTTCTG TCCAGGGAGC TCACGCTGAT TGATGATAAG 54 0 

CCATACTGTG CAGAGAAAAA GGCAACTCCA ATTAAACTCA TATGGTCCCT GGTGGCCTTA 6 00 

20 ATTTTCACCT TTTTTGTCCC TTTGTTGAGC ATTGTGACCT GCTACTGTTG CATTGCAAGG 66 0 

AAGCTGTGTG CCCATTACCA GCAATCAGGA AAGCACAACA AAAAGCTGAA GAAATCTAAG 72 0 

AAGAT CATCT TTATTGTCGT GGCAGCCTTT CTTGTCTCCT GGCTGCCCTT CAATACTTTC 78 0 

AAGTTCCTGG CCATTGTCTC TGGGTTGCGG CAAGAACACT ATTTACCCTC AGCTATTCTT 84 0 

CAGCTTGGTA TGGAGGTGAG TGGACCCTTG GCATTTGCCA ACAGCTGTGT CAACCCTTTC 90 0 

25 ATTTACTATA TCTTCGACAG CTACATCCGC CGGGCCATTG TCCACTGCTT GTGCCCTTGC 96 0 

CTGAAAAACT ATGACTTTGG G AGTAG C ACT GAGACATCAG ATAGTCACCT CACTAAGGCT 102 0 

CTCTCCACCT TCATTCATGC AGAAGATTTT GCCAGGAGGA GGAAGAGGTC TGTGTCACTC 108 0 

TAA 1083 
(181) INFORMATION FOR SEQ ID NO: 180: 

30 (i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 360 amino acids 
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(B) TYPE: amino acid 

(C) STRANDEDNESS : 

(D) TOPOLOGY: not relevant 

(ii) MOLECULE TYPE: protein 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 180: 

Met Asp Pro Glu Glu Thr Ser Val Tyr Leu Asp Tyr Tyr Tyr Ala Thr 
15 10 15 

Ser Pro Asn Ser Asp lie Arg Glu Thr His Ser His Val Pro Tyr Thr 
20 25 30 

Ser Val Phe Leu Pro Val Phe Tyr Thr Ala Val Phe Leu Thr Gly Val 
35 40 45 

Leu Gly Asn Leu Val Leu Met Gly Ala Leu His Phe Lys Pro Gly Ser 
50 55 60 

Arg Arg Leu lie Asp lie Phe lie lie Asn Leu Ala Ala Ser Asp Phe 
65 70 75 80 

lie Phe Leu Val Thr Leu Pro Leu Trp Val Asp Lys Glu Ala Ser Leu 
85 90 95 

Gly Leu Trp Arg Thr Gly Ser Phe Leu Cys Lys Gly Ser Ser Tyr Met 
100 105 110 

lie Ser Val Asn Met His Cys Ser Val Leu Leu Leu Thr Cys Met Ser 
115 120 125 

Val Asp Arg Tyr Leu Ala lie Val Trp Pro Val Val Ser Arg Lys Phe 
130 135 140 

Arg Arg Thr Asp Cys Ala Tyr Val Val Cys Ala Ser lie Trp Phe lie 
145 150 155 160 

Ser Cys Leu Leu Gly Leu Pro Thr Leu Leu Ser Arg Glu Leu Thr Leu 
165 170 175 

lie Asp Asp Lys Pro Tyr Cys Ala Glu Lys Lys Ala Thr Pro lie Lys 
180 185 190 

Leu lie Trp Ser Leu Val Ala Leu lie Phe Thr Phe Phe Val Pro Leu 
195 200 205 

Leu Ser lie Val Thr Cys Tyr Cys Cys lie Ala Arg Lys Leu Cys Ala 
210 215 220 



His Tyr Gin Gin 
225 

Lys lie lie Phe 



Ser Gly Lys His 
230 

lie Val Val Ala 
245 



Asn Lys Lys Leu 
235 

Ala Phe Leu Val 
250 



Lys Lys Ser Lys 
240 

Ser Trp Leu Pro 
255 
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Phe Asn Thr Phe Lys Phe Leu Ala lie Val Ser Gly Leu Arg Gin Glu 
260 265 270 

His Tyr Leu Pro Ser Ala lie Leu Gin Leu Gly Met Glu Val Ser Gly 
275 280 285 

5 Pro Leu Ala Phe Ala Asn Ser Cys Val Asn Pro Phe lie Tyr Tyr lie 

290 295 300 

Phe Asp Ser Tyr lie Arg Arg Ala lie Val His Cys Leu Cys Pro Cys 
305 310 315 320 

Leu Lys Asn Tyr Asp Phe Gly Ser Ser Thr Glu Thr Ser Asp Ser His 
10 325 330 335 

Leu Thr Lys Ala Leu Ser Thr Phe lie His Ala Glu Asp Phe Ala Arg 
340 345 350 

Arg Arg Lys Arg Ser Val Ser Leu 
355 360 

15 (182) INFORMATION FOR SEQ ID NO: 181: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 102 0 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : single 
20 (D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 181: 

ATGAATGGCC TTGAAGTGGC TCCCCCAGGT CTGATCACCA ACTTCTCCCT GGCCACGGCA 6 0 

GAGCAATGTG GCCAGGAGAC GCCACTGGAG AACATGCTGT TCGCCTCCTT CTACCTTCTG 12 0 

25 GATTTTATCC TGGCTTTAGT TGGCAATACC CTGGCTCTGT GG CTTTTC AT CCGAGACCAC 18 0 

AAGTCCGGGA CCCCGGCCAA CGTGTTCCTG ATGCATCTGG CCGTGGCCGA CTTGTCGTGC 24 0 

GTGCTGGTCC TGCCCACCCG CCTGGTCTAC CACTTCTCTG GGAACCACTG GCCATTTGGG 3 00 

G AAATCG CAT GCCGTCTCAC CGGCTTCCTC TTCTACCTCA ACATGTACGC CAGCATCTAC 36 0 

TTCCTCACCT GCATCAGCGC CGACCGTTTC CTGGCCATTG TGCACCCGGT CAAGTCCCTC 42 0 

30 AAGCTCCGCA GGCCCCTCTA CGCACACCTG GCCTGTGCCT TCCTGTGGGT GGTGGTGGCT 480 

GTGGCCATGG CCCCGCTGCT GGTGAGCCCA CAGACCGTGC AGACCAACCA CACGGTGGTC 54 0 

TGCCTGCAGC TGTACCGGGA GAAGGCCTCC CACCATGCCC TGGTGTCCCT GGCAGTGGCC 6 00 

TTCACCTTCC CGTTCATCAC CACGGTCACC TGCTACCTGC TGATCATCCG CAGCCTGCGG 660 
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CAGGGCCTGC GTGTGGAGAA GCGCCTCAAG ACCAAGGCAA AACGCATGAT CGCCATAGTG 72 0 

CTGGCCATCT TCCTGGTCTG CTTCGTGCCC TACCACGTCA ACCGCTCCGT CTACGTGCTG 78 0 

CACTACCGCA GCCATGGGGC CTCCTGCGCC ACCCAGCGCA TCCTGGCCCT GGCAAACCGC 84 0 

ATCACCTCCT GCCTCACCAG CCTCAACGGG GCACTCGACC CCATCATGTA TTTCTTCGTG 90 0 

5 GCTGAGAAGT TCCGCCACGC CCTGTGCAAC TTGCTCTGTG GCAAAAGGCT CAAGGGCCCG 960 

CCCCCCAGCT TCGAAGGGAA AACCAACGAG AGCTCGCTGA GTGCCAAGTC AGAGCTGTGA 102 0 

(18 3) INFORMATION FOR SEQ ID NO: 182: 

(i) SEQUENCE CHARACTERISTICS : 

(A) LENGTH: 339 amino acids 
10 (B) TYPE: amino acid 

( C ) STRANDEDNESS : 

(D) TOPOLOGY: not relevant 

<ii) MOLECULE TYPE: protein 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO:182: 

15 Met Asn Gly Leu Glu Val Ala Pro Pro Gly Leu lie Thr Asn Phe Ser 

15 10 15 

Leu Ala Thr Ala Glu Gin Cys Gly Gin Glu Thr Pro Leu Glu Asn Met 
20 25 30 

Leu Phe Ala Ser Phe Tyr Leu Leu Asp Phe lie Leu Ala Leu Val Gly 
20 35 40 45 

Asn Thr Leu Ala Leu Trp Leu Phe lie Arg Asp His Lys Ser Gly Thr 
50 55 60 

Pro Ala Asn Val Phe Leu Met His Leu Ala Val Ala Asp Leu Ser Cys 
65 70 75 80 

25 Val Leu Val Leu Pro Thr Arg Leu Val Tyr His Phe Ser Gly Asn His 

85 90 95 

Trp Pro Phe Gly Glu lie Ala Cys Arg Leu Thr Gly Phe Leu Phe Tyr 
100 105 110 

Leu Asn Met Tyr Ala Ser lie Tyr Phe Leu Thr Cys lie Ser Ala Asp 
30 115 120 125 

Arg Phe Leu Ala lie Val His Pro Val Lys Ser Leu Lys Leu Arg Arg 
130 135 140 

Pro Leu Tyr Ala His Leu Ala Cys Ala Phe Leu Trp Val Val Val Ala 
145 150 155 160 

35 Val Ala Met Ala Pro Leu Leu Val Ser Pro Gin Thr Val Gin Thr Asn 
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165 170 175 

His Thr Val Val Cys Leu Gin Leu Tyr Arg Glu Lys Ala Ser His His 
180 185 190 

Ala Leu Val Ser Leu Ala Val Ala Phe Thr Phe Pro Phe lie Thr Thr 
195 200 205 

Val Thr Cys Tyr Leu Leu He He Arg Ser Leu Arg Gin Gly Leu Arg 
21° 215 220 

Val Glu Lys Arg Leu Lys Thr Lys Ala Lys Arg Met He Ala He Val 
225 230 235 240 

Leu Ala He Phe Leu Val Cys Phe Val Pro Tyr His Val Asn Arg Ser 
245 250 255 

Val Tyr Val Leu His Tyr Arg Ser His Gly Ala Ser Cys Ala Thr Gin 
260 265 270 

Arg He Leu Ala Leu Ala Asn Arg He Thr Ser Cys Leu Thr Ser Leu 
15 275 280 285 

Asn Gly Ala Leu Asp Pro He Met Tyr Phe Phe Val Ala Glu Lys Phe 
290 295 300 

Arg His Ala Leu Cys Asn Leu Leu Cys Gly Lys Arg Leu Lys Gly Pro 
305 310 315 320 



10 



20 



Pro Pro Ser Phe Glu Gly Lys Thr Asn Glu Ser Ser Leu Ser Ala Lys 
325 330 335 



Ser Glu Leu 

(183) INFORMATION FOR SEQ ID NO: 183: 

25 (i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 996 base pairs 

(B) TYPE: nucleic acid 
<C) STRANDEDNESS : single 
(D) TOPOLOGY: linear 

30 (ii) MOLECULE TYPE: DNA (genomic) 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO:183: 
ATGATCACCC TGAACAATCA AGATCAACCT GTCCCTTTTA ACAGCTCACA TCCAGATGAA 

TACAAAATTG CAGCCCTTGT CTTCTATAGC TGTATCTTCA TAATTGGATT ATTTGTTAAC 12 0 

ATCACTGCAT TATGGGTTTT CAGTTGTACC ACCAAGAAGA GAACCACGGT AACCATCTAT 18 0 

35 ATGATGAATG TGGCATTAGT GGACTTGATA TTTATAATGA CTTTACCCTT TCGAATGTTT 24 0 



60 
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TATTATGCAA AAGATGAATG GCCATTTGGA GAGTACTTCT GCCAGATTCT TGGAGCTCTC 3 00 

ACAGTGTTTT ACCCAAGCAT TGCTTTATGG CTTCTTGCCT TTATTAGTGC TGACAGATAC 36 0 

ATGGCCATTG TACAGCCGAA GTACGCCAAA GAACTTAAAA ACACGTGCAA AGCCGTGCTG 42 0 

GCGTGTGTGG GAGTCTGGAT AATGACCCTG ACCACGACCA CCCCTCTGCT ACTG CTCTAT 48 0 

5 AAAGACCCAG ATAAAGACTC CACTCCCGCC ACCTGCCTCA AGATTTCTGA CATCATCTAT 54 0 

CTAAAAG CTG TGAACGTGCT GAACCTCACT CGACTGACAT TTTTTTTCTT GATTCCTTTG 6 00 

TTCATCATGA TTGGGTGCTA CTTGGTCATT ATTCATAATC TCCTTCACGG CAGGACGTCT 66 0 

AAGCTGAAAC CCAAAGTCAA GGAGAAGTCC AAAAGGATCA TCATCACGCT GCTGGTGCAG 72 0 

GTGCTCGTCT GCTTTATGCC CTTCCACATC TGTTTCGCTT TCCTGATGCT GGGAACGGGG 78 0 

10 GAGAATAGTT ACAATCCCTG GGGAGCCTTT ACCACCTTCC TCATGAACCT C AG C ACGTGT 84 0 

CTGGATGTGA TTCTCTACTA CATCGTTTCA AAACAATTTC AGGCTCGAGT CATTAGTGTC 90 0 

ATGCTAT AC C GTAATTACCT TCGAAGCATG CGCAGAAAAA GTTTCCGATC TGGTAGTCTA 96 0 

AGGTCACTAA GCAATATAAA CAGTGAAATG TTATGA 996 
(185) INFORMATION FOR SEQ ID NO:184: 

15 (i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 331 amino acids 

(B) TYPE: amino acid 

(C) STRANDEDNESS : 

(D) TOPOLOGY: not relevant 

20 (ii) MOLECULE TYPE: protein 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO:184: 

Met lie Thr Leu Asn Asn Gin Asp Gin Pro Val Pro Phe Asn Ser Ser 
15 10 15 

His, Pro Asp Glu Tyr Lys lie Ala Ala Leu Val Phe Tyr Ser Cys lie 
25 20 25 30 

Phe lie lie Gly Leu Phe Val Asn lie Thr Ala Leu Trp Val Phe Ser 
35 40 45 

Cys Thr Thr Lys Lys Arg Thr Thr Val Thr lie Tyr Met Met Asn Val 
50 55 60 

30 Ala Leu Val Asp Leu lie Phe lie Met Thr Leu Pro Phe Arg Met Phe 

65 70 75 80 

Tyr Tyr Ala Lys Asp Glu Trp Pro Phe Gly Glu Tyr Phe Cys Gin lie 
85 90 95 
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Leu Gly Ala Leu Thr Val Phe Tyr Pro Ser lie Ala Leu Trp Leu Leu 
100 105 110 

Ala Phe lie Ser Ala Asp Arg Tyr Met Ala lie Val Gin Pro Lys Tyr 
115 120 125 

5 Ala Lys Glu Leu Lys Asn Thr Cys Lys Ala Val Leu Ala Cys Val Gly 

130 135 140 

Val Trp lie Met Thr Leu Thr Thr Thr Thr Pro Leu Leu Leu Leu Tyr 
145 150 155 160 

Lys Asp Pro Asp Lys Asp Ser Thr Pro Ala Thr Cys Leu Lys lie Ser 
10 165 170 175 

Asp lie lie Tyr Leu Lys Ala Val Asn Val Leu Asn Leu Thr Arg Leu 
180 185 190 

Thr Phe Phe Phe Leu lie Pro Leu Phe lie Met lie Gly Cys Tyr Leu 
195 200 205 

15 Val lie lie His Asn Leu Leu His Gly Arg Thr Ser Lys Leu Lys Pro 

210 215 220 

Lys Val Lys Glu Lys Ser Lys Arg lie lie lie Thr Leu Leu Val Gin 
225 230 235 240 

Val Leu Val Cys Phe Met Pro Phe His lie Cys Phe Ala Phe Leu Met 
20 245 250 255 

Leu Gly Thr Gly Glu Asn Ser Tyr Asn Pro Trp Gly Ala Phe Thr Thr 
260 265 270 

Phe Leu Met Asn Leu Ser Thr Cys Leu Asp Val lie Leu Tyr Tyr lie 
275 280 285 

25 Val Ser Lys Gin Phe Gin Ala Arg Val lie Ser Val Met Leu Tyr Arg 

290 295 300 

Asn Tyr Leu Arg Ser Met Arg Arg Lys Ser Phe Arg Ser Gly Ser Leu 
305 310 315 320 

Arg Ser Leu Ser Asn lie Asn Ser Glu Met Leu 
30 325 330 

(18 6) INFORMATION FOR SEQ ID NO: 185: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 1077 base pairs 

(B) TYPE: nucleic acid 
35 <C) STRANDEDNESS : single 

<D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 
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<xi) SEQUENCE DESCRIPTION: SEQ ID NO: 185: 

ATGCCCTCTG TGTCTCCAGC GGGGCCCTCG GCCGGGGCAG TCCCCAATGC CACCGCAGTG 6 0 

ACAACAGTGC GGACCAATGC CAGCGGGCTG GAGGTGCCCC TGTTCCACCT GTTTGCCCGG 12 0 

CTGGACGAGG AG CTGC ATGG CACCTTCCCA GGCCTGTGCG TGGCGCTGAT GGCGGTGCAC 18 0 

GGAGCCATCT TCCTGGCAGG GCTGGTGCTC AACGGGCTGG CGCTGTACGT CTTCTGCTGC 24 0 

CGCACCCGGG CCAAGACACC CTCAGTCATC TACACCATCA ACCTGGTGGT GACCGATCTA 3 00 

CTGGTAGGGC TGTCCCTGCC CACGCGCTTC GCTGTGTACT ACGGCGCCAG GGGCTGCCTG 36 0 

CGCTGTGCCT TCCCGCACGT CCTCGGTTAC TTCCTCAACA TGCACTGCTC CATCCTCTTC 4 20 

CTCACCTGCA TCTGCGTGGA CCGCTACCTG GCCATCGTGC GGCCCGAAGG CTCCCGCCGC 4 80 

TGCCGCCAGC CTGCCTGTGC CAGGGCCGTG TGCGCCTTCG TGTGGCTGGC CGCCGGTGCC 54 0 

GTCACCCTGT CGGTGCTGGG CGTGACAGGC AGCCGGCCCT GCTGCCGTGT CTTTGCGCTG 6 00 

ACTGTCCTGG AGTTCCTGCT GCCCCTGCTG GTCATCAGCG TGTTTACCGG CCGCATCATG 66 0 

TGTGCACTGT CGCGGCCGGG TCTGCTCCAC CAGGGTCGCC AGCGCCGCGT GCGGGCCAAG 72 0 

CAGCTCCTGC TCACGGTGCT CATCATCTTT CTCGTCTGCT TCACGCCCTT CCACGCCCGC 78 0 

CAAGTGGCCG TGGCGCTGTG GCCCGACATG CCACACCACA CGAGCCTCGT GGTCTACCAC 84 0 

GTGGCCGTGA CCCTCAGCAG CCTCAACAGC TGCATGGACC CCATCGTCTA CTGCTTCGTC 90 0 

ACCAGTGGCT TCCAGGCCAC CGTCCGAGGC CTCTTCGGCC AG C ACGG AG A GCGTGAGCCC 96 0 

AGCAGCGGTG ACGTGGTCAG CATGCACAGG AGCTCCAAGG GCTCAGGCCG TCATCACATC 102 0 

CTCAGTGCCG GCCCTCACGC CCTCACCCAG GCCCTGGCTA ATGGGCCCGA GGCTTAG 107 7 
(187) INFORMATION FOR SEQ ID NO:186: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 358 amino acids 

(B) TYPE: amino acid 

(C) STRANDEDNESS : 

<D) TOPOLOGY: not relevant 

(ii) MOLECULE TYPE: protein 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 186: 

Met Pro Ser Val Ser Pro Ala Gly Pro Ser Ala Gly Ala Val Pro Asn 
1.5 10 15 

Ala Thr Ala Val Thr Thr Val Arg Thr Asn Ala Ser Gly Leu Glu Val 
20 25 30 
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Pro Leu Phe His Leu Phe Ala Arg Leu Asp Glu Glu Leu His Gly Thr 
35 40 45 

Phe Pro Gly Leu Cys Val Ala Leu Met Ala Val His Gly Ala lie Phe 
50 C5 60 

5 Leu Ala Gly Leu Val Leu Asn Gly Leu Ala Leu Tyr Val Phe Cys Cys 

65 70 75 80 

Arg Thr Arg Ala Lys Thr Pro Ser Val lie Tyr Thr lie Asn Leu Val 
85 90 95 

Val Thr Asp Leu Leu Val Gly Leu Ser Leu Pro Thr Arg Phe Ala Val 
10 100 105 no 

Tyr Tyr Gly Ala Arg Gly Cys Leu Arg Cys Ala Phe Pro His Val Leu 
115 120 125 

Gly Tyr Phe Leu Asn Met His Cys Ser lie Leu Phe Leu Thr Cys lie 
130 135 140 

15 Cys Val Asp Arg Tyr Leu Ala lie Val Arg Pro Glu Gly Ser Arg Ala 

145 150 155 160 

Cys Arg Gin Pro Ala Cys Ala Arg Ala Val Cys Ala Phe Val Trp Leu 
165 170 175 

Ala Ala Gly Ala Val Thr Leu Ser Val Leu Gly Val Thr Gly Ser Arg 
20 180 185 190 

Pro Cys Cys Arg Val Phe Ala Leu Thr Val Leu Glu Phe Leu Leu Pro 
195 200 205 

Leu Leu Val lie Ser Val Phe Thr Gly Arg lie Met Cys Ala Leu Ser 
210 215 220 

25 Arg Pro Gly Leu Leu His Gin Gly Arg Gin Arg Arg Val Arg Ala Lys 

225 230 235 240 

Gin Leu Leu Leu Thr Val Leu lie lie Phe Leu Val Cys Phe Thr Pro 
245 250 255 

Phe His Ala Arg Gin Val Ala Val Ala Leu Trp Pro Asp Met Pro His 
30 260 265 270 

His Thr Ser Leu Val Val Tyr His Val Ala Val Thr Leu Ser Ser Leu 
275 280 285 

Asn Ser Cys Met Asp Pro lie Val Tyr Cys Phe Val Thr Ser Gly Phe 
290 295 300 

35 Gin Ala Thr Val Arg Gly Leu Phe Gly Gin His Gly Glu Arg Glu Pro 

305 310 315 320 

Ser Ser Gly Asp Val Val Ser Met His Arg Ser Ser Lys Gly Ser Gly 
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325 330 335 

Arg His His lie Leu Ser Ala Gly Pro His Ala Leu Thr Gin Ala Leu 
340 345 350 

Ala Asn Gly Pro Glu Ala 



(18 8) INFORMATION FOR SEQ ID NO: 187: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 1050 base pairs 

(B) TYPE: nucleic acid 
10 (C) STRANDEDNESS : single 

( D ) TOPOLOGY : 1 inear 

(ii) MOLECULE TYPE: DNA (genomic) 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO:187: 

ATGAACTCCA CCTTGGATGG TAATCAGAGC AGCCACCCTT TTTGCCTCTT GGCATTTGGC 6 0 

15 TATTTGGAAA CTGTCAATTT TTGCCTTTTG GAAGTATTGA TTATTGTCTT TCTAACTGTA 12 0 

TTGATTATTT CTGGCAACAT CATTGTGATT TTTGTATTTC ACTGTGCACC TTTGTTGAAC 180 

CATCACACTA CAAGTTATTT TATCCAGACT ATGGCATATG CTGACCTTTT TGTTGGGGTG 24 0 

AGCTGCGTGG TCCCTTCTTT ATCACTCCTC CATCACCCCC TTCCAGTAGA GGAGTCCTTG 300 

ACTTGCCAGA TATTTGGTTT TGTAGTAT C A GTTCTGAAGA GCGTCTCCAT GGCTTCTCTG 36 0 

20 GCCTGTATCA GCATTGATAG ATACATTGCC ATTACTAAAC CTTTAACCTA TAATACTCTG 42 0 

GTTACACCCT GGAGACTACG CCTGTGTATT TTCCTGATTT GGCTATACTC GACCCTGGTC 48 0 

TTCCTGCCTT CCTTTTTCCA CTGGGGCAAA CCTGGATATC ATGGAGATGT GTTTCAGTGG 54 0 

TGTGCGGAGT CCTGGCACAC CGACTCCTAC TTCACCCTGT TCATCGTGAT GATGTTATAT 600 

GCCCCAGCAG CCCTTATTGT CTGCTTCACC TATTTCAACA TCTTCCGCAT CTGCCAACAG 66 0 

25 CACACAAAGG ATATCAGCGA AAGGCAAGCC CGCTTCAGCA GCCAGAGTGG GGAGACTGGG 72 0 

GAAGTGCAGG CCTGTCCTGA TAAGCGCTAT AAAATGGTCC TGTTTCGAAT CACTAGTGTA 780 

TTTTACATCC TCTGGTTGCC ATATATCATC TACTTCTTGT TGGAAAGCTC CACTGGCCAC 84 0 

AGCAACCGCT TCGCATCCTT CTTGACCACC TGGCTTGCTA TTAGTAACAG TTTCTG C AAC 900 

TGTGTAATTT ATAGTCTCTC CAACAGTGTA TTC CAAAGAG GACTAAAGCG CCTCTCAGGG 96 0 

30 GCTATGTGTA CTTCTTGTGC AAGTCAGACT AC AG C C AACG ACCCTTACAC AGTTAGAAGC 1020 

AAAGGCCCTC TTAATGGATG TCATATCTGA 1050 
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(189) INFORMATION FOR SEQ ID NO: 188: 

( i ) SEQUENCE CHARACTERISTICS : 

(A) LENGTH: 34 9 amino acids 

(B) TYPE: amino acid 
5 (C) STRANDEDNESS : 

(D) TOPOLOGY: not relevant 

(ii) MOLECULE TYPE: protein 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 188: 

Met Asn Ser Thr Leu Asp Gly Asn Gin Ser Ser His Pro Phe Cys Leu 
10 1 5 10 15 

Leu Ala Phe Gly Tyr Leu Glu Thr Val Asn Phe Cys Leu Leu Glu Val 
20 25 30 

Leu lie lie Val Phe Leu Thr Val Leu lie lie Ser Gly Asn lie lie 
35 40 45 

15 val He Phe Val Phe His Cys Ala Pro Leu Leu Asn His His Thr Thr 

50 55 60 

Ser Tyr Phe He Gin Thr Met Ala Tyr Ala Asp Leu Phe Val Gly Val 
65 70 75 80 

Ser Cys Val Val Pro Ser Leu Ser Leu Leu His His Pro Leu Pro Val 
20 85 90 95 

Glu Glu Ser Leu Thr Cys Gin He Phe Gly Phe Val Val Ser Val Leu 
100 105 110 

Lys Ser Val Ser Met Ala Ser Leu Ala Cys He Ser He Asp Arg Tyr 
115 120 125 

25 He Ala He Thr Lys Pro Leu Thr Tyr Asn Thr Leu Val Thr Pro Trp 

130 135 140 

Arg Leu Arg Leu Cys He Phe Leu He Trp Leu Tyr Ser Thr Leu Val 
145 150 155 160 

Phe Leu Pro Ser Phe Phe His Trp Gly Lys Pro Gly Tyr His Gly Asp 
30 165 170 175 

Val Phe Gin Trp Cys Ala Glu Ser Trp His Thr Asp Ser Tyr Phe Thr 
180 185 190 

Leu Phe He Val Met Met Leu Tyr Ala Pro Ala Ala Leu He Val Cys 
195 200 205 

35 Phe Thr Tyr Phe Asn He Phe Arg lie Cys Gin Gin His Thr Lys Asp 

210 215 220 
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He Ser Glu Arg Gin Ala Arg Phe Ser Ser Gin Ser Gly Glu Thr Gly 
225 230 235 240 

Glu Val Gin Ala Cys Pro Asp Lys Arg Tyr Lys Met Val Leu Phe Arg 
245 250 255 

5 He Thr Ser Val Phe Tyr He Leu Trp Leu Pro Tyr He He Tyr Phe 

260 265 270 

Leu Leu Glu Ser Ser Thr Gly His Ser Asn Arg Phe Ala Ser Phe Leu 
275 280 285 

Thr Thr Trp Leu Ala He Ser Asn Ser Phe Cys Asn Cys Val He Tyr 
10 290 295 300 

Ser Leu Ser Asn Ser Val Phe Gin Arg Gly Leu Lys Arg Leu Ser Gly 
305 310 315 320 

Ala Met Cys Thr Ser Cys Ala Ser Gin Thr Thr Ala Asn Asp Pro Tyr 
325 330 335 

15 Thr Val Arg Ser Lys Gly Pro Leu Asn Gly Cys His He 

340 345 

(190) INFORMATION FOR SEQ ID NO: 189: 

(i) SEQUENCE CHARACTERISTICS: 
(A) LENGTH: 13 02 base pairs 

20 (B) TYPE: nucleic acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 18 9: 

25 ATGTGTTTTT CTCCCATTCT GGAAATCAAC ATGCAGTCTG AATCTAACAT TACAGTGCGA 6 0 

GATGACATTG ATGACATCAA CACCAATATG TACCAACCAC TATCATATCC GTTAAGCTTT 120 

CAAGTGTCTC TCACCGGATT TCTTATGTTA GAAATTGTGT TGGGACTTGG CAGCAACCTC 18 0 

ACTGTATTGG TACTTTACTG C ATG AAATC C AACTTAATCA ACTCTGTCAG TAACATTATT 24 0 

ACAATGAATC TTCATGTACT TGATGTAATA ATTTGTGTGG GATGTATTCC TCTAACTATA 300 

30 GTTATCCTTC TGCTTTCACT GGAGAGTAAC ACTGCTCTCA TTTGCTGTTT CCATGAGGCT 36 0 

TGTGTATCTT TTGCAAGTGT CTCAAC AG CA ATCAACGTTT TTGCTATCAC TTTGGACAGA 42 0 

TATGACATCT CTGTAAAACC TGCAAACCGA ATTCTGACAA TGGGCAGAGC TGTAATGTTA 48 0 

ATGATATCCA TTTGGATTTT TTCTTTTTTC TCTTTCCTGA TTC CTTTT AT TGAGGTAAAT 54 0 

TTTTTCAGTC TTCAAAGTGG AAATACCTGG GAAAACAAGA CACTTTTATG TGTCAGTACA 600 
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AATGAATACT ACACTGAACT GGGAATGTAT TATCACCTGT TAGTACAGAT CCCAATATTC 66 0 

TTTTTCACTG TTGTAGTAAT GTTAATCACA TACACCAAAA TACTTCAGGC TCTTAATATT 72 0 

CGAATAGGCA CAAGATTTTC AACAGGGCAG AAGAAGAAAG CAAGAAAGAA AAAGACAATT 78 0 

TCTCTAACCA CACAACATGA GGCTACAGAC ATGTCACAAA GCAGTGGTGG GAGAAATGTA 84 0 

GTCTTTGGTG TAAGAACTTC AGTTTCTGTA ATAATTGCCC TCCGGCGAGC TGTGAAACGA 900 

CACCGTGAAC GACGAGAAAG ACAAAAGAGA GTCAAGAGGA TGTCTTTATT GATTATTTCT 96 0 

ACATTTCTTC TCTGCTGGAC ACCAATTTCT GTTTTAAATA CCACCATTTT ATGTTTAGGC 102 0 

CCAAGTGACC TTTTAGTAAA ATTAAGATTG TGTTTTTTAG TCATGGCTTA TGGAACAACT 108 0 

AT ATTTC AC C CTCTATTATA TGCATTCACT AGACAAAAAT TTCAAAAGGT CTTGAAAAGT 114 0 

AAAATGAAAA AGCGAGTTGT TTCTATAGTA GAAGCTGATC CCCTGCCTAA TAATGCTGTA 12 0 0 

ATACACAACT CTTGGATAGA TCCCAAAAGA AACAAAAAAA TT AC CTTTG A AG AT AG TG AA 126 0 

ATAAGAGAAA AACGTTTAGT GCCTCAGGTT GTCACAGACT AG 13 02 
(191) INFORMATION FOR SEQ ID NO: 190: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 43 3 amino acids 

(B) TYPE: amino acid 

(C) STRANDEDNESS : 

(D) TOPOLOGY: not relevant 

(ii) MOLECULE TYPE: protein 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 190: 

Met Cys Phe Ser Pro lie Leu Glu lie Asn Met Gin Ser Glu Ser Asn 
15 10 15 

lie Thr Val Arg Asp Asp lie Asp Asp lie Asn Thr Asn Met Tyr Gin 
20 25 30 

Pro Leu Ser Tyr Pro Leu Ser Phe Gin Val Ser Leu Thr Gly Phe Leu 
35 40 45 

Met Leu Glu lie Val Leu Gly Leu Gly Ser Asn Leu Thr Val Leu Val 
50 55 60 

Leu Tyr Cys Met Lys Ser Asn Leu lie Asn Ser Val Ser Asn lie lie 
65 70 75 80 

Thr Met Asn Leu His Val Leu Asp Val lie lie Cys Val Gly Cys lie 
85 90 95 

Pro Leu Thr lie Val lie Leu Leu Leu Ser Leu Glu Ser Asn Thr Ala 
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100 105 110 

Leu lie Cys Cys Phe His Glu Ala Cys Val Ser Phe Ala Ser Val Ser 
115 120 125 

Thr Ala lie Asn Val Phe Ala lie Thr Leu Asp Arg Tyr Asp lie Ser 
5 130 135 140 

Val Lys Pro Ala Asn Arg lie Leu Thr Met Gly Arg Ala Val Met Leu 
145 150 155 160 

Met He Ser He Trp lie Phe Ser Phe Phe Ser Phe Leu He Pro Phe 
165 170 175 

10 He Glu Val Asn Phe Phe Ser Leu Gin Ser Gly Asn Thr Trp Glu Asn 

180 185 190 

Lys Thr Leu Leu Cys Val Ser Thr Asn Glu Tyr Tyr Thr Glu Leu Gly 
195 200 205 

Met Tyr Tyr His Leu Leu Val Gin He Pro He Phe Phe Phe Thr Val 
15 210 215 220 

Val Val Met Leu He Thr Tyr Thr Lys He Leu Gin Ala Leu Asn He 
225 230 235 240 

Arg He Gly Thr Arg Phe Ser Thr Gly Gin Lys Lys Lys Ala Arg Lys 
245 250 255 

20 Lys Lys Thr He Ser Leu Thr Thr Gin His Glu Ala Thr Asp Met Ser 

260 265 270 

Gin Ser Ser Gly Gly Arg Asn Val Val Phe Gly Val Arg Thr Ser Val 
275 280 285 

Ser Val He He Ala Leu Arg Arg Ala Val Lys Arg His Arg Glu Arg 
25 290 295 300 

Arg Glu Arg Gin Lys Arg Val Lys Arg Met Ser Leu Leu He He Ser 
305 310 315 320 

Thr Phe Leu Leu Cys Trp Thr Pro He Ser Val Leu Asn Thr Thr He 
325 330 335 

30 Leu Cys Leu Gly Pro Ser Asp Leu Leu Val Lys Leu Arg Leu Cys Phe 

340 345 350 

Leu Val Met Ala Tyr Gly Thr Thr He Phe His Pro Leu Leu Tyr Ala 
355 360 365 

Phe Thr Arg Gin Lys Phe Gin Lys Val Leu Lys Ser Lys Met Lys Lys 
35 370 375 380 

Arg Val Val Ser He Val Glu Ala Asp Pro Leu Pro Asn Asn Ala Val 
385 390 395 400 
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He His Asn Ser Trp He Asp Pro Lys Arg Asn Lys Lys He Thr Phe 
405 410 415 



Glu Asp Ser Glu He Arg Glu Lys Arg Leu Val Pro Gl 
420 425 



n Val Val Thr 
430 



Asp 

(192) INFORMATION FOR SEQ ID NO: 191: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 1209 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 191: 

ATGTTGTGTC CTTCCAAGAC AGATGGCTCA GGGCACTCTG GTAGGATTCA CCAGGAAACT 6 0 

CATGGAGAAG GGAAAAGGGA CAAGATTAGC AACAGTGAAG GGAGGGAGAA TGGTGGGAGA 12 0 

GGATTCCAGA TGAACGGTGG GTCGCTGGAG GCTGAGCATG CCAGCAGGAT GTCAGTTCTC 18 0 

AGAGCAAAGC CCATGTCAAA CAGCCAACGC TTGCTCCTTC TGTCCCCAGG ATCACCTCCT 24 0 

CGCACGGGGA GCATCTCCTA CATCAACATC ATCATGCCTT CGGTGTTCGG CACCATCTGC 3 00 

CTCCTGGGCA TCATCGGGAA CTCCACGGTC ATCTTCGCGG TCGTGAAGAA GTCCAAGCTG 360 

CACTGGTGCA ACAACGTCCC CGACATCTTC ATCATCAACC TCTCGGTAGT AGATCTCCTC 42 0 

TTTCTCCTGG GCATGCCCTT CATGATCCAC CAGCTCATGG GCAATGGGGT GTGGCACTTT 48 0 

GGGGAGACCA TGTGCACCCT CATCACGGCC ATGGATGCCA ATAGTCAGTT CACCAGCACC 54 0 

TACATCCTGA CCGCCATGGC CATTGACCGC TACCTGGCCA CTGTCCACCC CATCTCTTCC 6 00 

ACGAAGTTCC GGAAGCCCTC TGTGGCCACC CTGGTGATCT GCCTCCTGTG GGCCCTCTCC 66 0 

TTCATCAGCA TCACCCCTGT GTGG CTGTAT GCCAGACTCA TCCCCTTCCC AGGAGGTGCA 72 0 

GTGGGCTGCG GCATACGCCT GCCCAACCCA GACACTGACC TCTACTGGTT CACCCTGTAC 78 0 

CAGTTTTTCC TGGCCTTTGC CCTGCCTTTT GTGGTCATCA CAGCCGCATA CGTGAGGATC 84 0 

CTGCAGCGCA TGACGTCCTC AGTGGCCCCC GCCTCCCAGC GCAGCATCCG GCTGCGGACA 90 0 

AAGAGGGTGA AACGCACAGC CATCGCCATC TGTCTGGTCT TCTTTGTGTG CTGGGCACCC 96 0 

TACTATGTGC TACAGCTGAC CCAGTTGTCC ATCAGCCGCC CGACCCTCAC CTTTGTCTAC 102 0 

TTATACAATG CGGCCATCAG CTTGGGCTAT GCCAACAGCT GCCTCAACCC CTTTGTGTAC 108 0 
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ATCGTGCTCT GTGAGACGTT CCGCAAACGC TTGGTCCTGT CGGTGAAGCC TGCAGCCCAG 114 0 
GGGCAGCTTC GCGCTGTCAG CAACGCTCAG ACGGCTGACG AGGAGAGGAC AGAAAGCAAA 12 00 
GGCACCTGA 12 0 9 

(193) INFORMATION FOR SEQ ID NO: 192: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 4 02 amino acids 

(B) TYPE: amino acid 

(C) STRANDEDNESS : 

(D) TOPOLOGY: not relevant 

(ii) MOLECULE TYPE: protein 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO:192: 

Met Leu Cys Pro Ser Lys Thr Asp Gly Ser Gly His Ser Gly Arg lie 
15 10 15 

His Gin Glu Thr His Gly Glu Gly Lys Arg Asp Lys lie Ser Asn Ser 
20 25 30 

Glu Gly Arg Glu Asn Gly Gly Arg Gly Phe Gin Met Asn Gly Gly Ser 
35 40 45 

Leu Glu Ala Glu His Ala Ser Arg Met Ser Val Leu Arg Ala Lys Pro 
50 55 60 

Met Ser Asn Ser Gin Arg Leu Leu Leu Leu Ser Pro Gly Ser Pro Pro 
65 70 75 80 

Arg Thr Gly Ser lie Ser Tyr lie Asn lie lie Met Pro Ser Val Phe 
85 90 95 

Gly Thr lie Cys Leu Leu Gly lie lie Gly Asn Ser Thr Val lie Phe 
100 105 110 

Ala Val Val Lys Lys Ser Lys Leu His Trp Cys Asn Asn Val Pro Asp 
115 120 125 

lie Phe lie lie Asn Leu Ser Val Val Asp Leu Leu Phe Leu Leu Gly 
130 135 140 

Met Pro Phe Met lie His Gin Leu Met Gly Asn Gly Val Trp His Phe 
145 150 155 160 

Gly Glu Thr Met Cys Thr Leu lie Thr Ala Met Asp Ala Asn Ser Gin 
165 170 175 

Phe Thr Ser Thr Tyr lie Leu Thr Ala Met Ala lie Asp Arg Tyr Leu 
180 185 190 

Ala Thr Val His Pro lie Ser Ser Thr Lys Phe Arg Lys Pro Ser Val 
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195 200 205 

Ala Thr Leu Val lie Cys Leu Leu Trp Ala Leu Ser Phe He Ser He 
210 215 220 

Thr Pro Val Trp Leu Tyr Ala Arg Leu He Pro Phe Pro Gly Gly Ala 
5 225 230 235 240 

Val Gly Cys Gly He Arg Leu Pro Asn Pro Asp Thr Asp Leu Tyr Trp 
245 250 255 

Phe Thr Leu Tyr Gin Phe Phe Leu Ala Phe Ala Leu Pro Phe Val Val 
260 265 270 

10 He Thr Ala Ala Tyr Val Arg He Leu Gin Arg Met Thr Ser Ser Val 

275 280 285 

Ala Pro Ala Ser Gin Arg Ser He Arg Leu Arg Thr Lys Arg Val Lys 
290 295 300 

Arg Thr Ala He Ala He Cys Leu Val Phe Phe Val Cys Trp Ala Pro 
15 305 310 315 320 

Tyr Tyr Val Leu Gin Leu Thr Gin Leu Ser He Ser Arg Pro Thr Leu 
325 330 335 

Thr Phe Val Tyr Leu Tyr Asn Ala Ala He Ser Leu Gly Tyr Ala Asn 
340 345 350 

20 Ser Cys Leu Asn Pro Phe Val Tyr He Val Leu Cys Glu Thr Phe Arg 

355 360 365 

Lys Arg Leu Val Leu Ser Val Lys Pro Ala Ala Gin Gly Gin Leu Arg 
370 375 380 

Ala Val Ser Asn Ala Gin Thr Ala Asp Glu Glu Arg Thr Glu Ser Lys 
25 385 390 395 400 

Gly Thr 

(194) INFORMATION FOR SEQ ID NO:193: 

(i) SEQUENCE CHARACTERISTICS: 
30 (A) LENGTH: 1128 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 

35 (xi) SEQUENCE DESCRIPTION: SEQ ID NO:193: 

ATGGATGTGA CTTCCCAAGC CCGGGGCGTG GGCCTGGAGA TGTACCCAGG CACCGCGCAC 6 0 

GCTGCGGCCC CCAACACCAC CTCCCCCGAG CTCAACCTGT CCCACCCGCT CCTGGGCACC 12 0 
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GCCCTGGCCA ATGGGACAGG TGAGCTCTCG GAG C AC C AG C AGTACGTGAT CGGCCTGTTC 18 0 

CTCTCGTGCC TCTACACCAT CTTCCTCTTC CCCATCGGCT TTGTGGGCAA CATCCTGATC 24 0 

CTGGTGGTGA ACATCAGCTT CCGCGAGAAG ATGACCATCC CCGACCTGTA CTTCATCAAC 3 00 

CTGGCGGTGG CGGACCTCAT CCTGGTGGCC GACTCCCTCA TTGAGGTGTT CAACCTGCAC 36 0 

5 GAGCGGTACT ACGACATCGC CGTCCTGTGC ACCTTCATGT CGCTCTTCCT GCAGGTCAAC 42 0 

ATGTACAGCA GCGTCTTCTT CCTCACCTGG ATGAGCTTCG ACCGCTACAT CGCCCTGGCC 4 80 

AGGGCCATGC GCTGCAGCCT GTTCCGCACC AAGCACCACG CCCGGCTGAG CTGTGGCCTC 54 0 

ATCTGGATGG CATCCGTGTC AGCCACGCTG GTGCCCTTCA CCGCCGTGCA CCTGCAGCAC 6 00 

ACCGACGAGG CCTGCTTCTG TTTCGCGGAT GTCCGGGAGG TGCAGTGGCT CGAGGTCACG 66 0 

10 CTGGGCTTCA TCGTGCCCTT CGCCATCATC GGCCTGTGCT ACTCCCTCAT TGTCCGGGTG 72 0 

CTGGTCAGGG CGCACCGGCA CCGTGGGCTG CGGCCCCGGC GGCAGAAGGC GAAACGCATG 78 0 

ATCCTCGCGG TGGTGCTGGT CTTCTTCGTC TGCTGGCTGC CGGAGAACGT CTTCATCAGC 84 0 

GTGCACCTCC TGCAGCGGAC GCAGCCTGGG GCCGCTCCCT GCAAGCAGTC TTTCCGCCAT 90 0 

GCCCACCCCC TCACGGGCCA CATTGTCAAC CTCGCCGCCT TCTCCAACAG CTGCCTAAAC 96 0 

15 CCCCTCATCT ACAGCTTTCT CGGGGAGACC TTCAGGGACA AGCTGAGGCT GTACATTGAG 102 0 

CAGAAAACAA ATTTGCCGGC CCTGAACCGC TTCTGTCACG CTGCCCTGAA GGCCGTCATT 108 0 

CCAGACAGCA CCGAGCAGTC GGATGTGAGG TTCAGCAGTG CCGTGTGA 112 8 

(195) INFORMATION FOR SEQ ID NO: 194: 

(i) SEQUENCE CHARACTERISTICS: 
20 (A) LENGTH: 375 amino acids 

(B) TYPE: amino acid 

(C) STRANDEDNESS : 

(D) TOPOLOGY: not relevant 

(ii) MOLECULE TYPE: protein 

25 (xi) SEQUENCE DESCRIPTION: SEQ ID NO:194: 

Met Asp Val Thr Ser Gin Ala Arg Gly Val Gly Leu Glu Met Tyr Pro 
15 10 15 

Gly Thr Ala His Ala Ala Ala Pro Asn Thr Thr Ser Pro Glu Leu Asn 
20 25 30 

30 Leu Ser His Pro Leu Leu Gly Thr Ala Leu Ala Asn Gly Thr Gly Glu . 

35 40 45 
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Leu Ser Glu His Gin Gin Tyr Val lie Gly Leu Phe Leu Ser Cys Leu 
50 55 60 

Tyr Thr lie Phe Leu Phe Pro lie Gly Phe Val Gly Asn lie Leu lie 
65 70 75 80 

5 Leu Val Val Asn lie Ser Phe Arg Glu Lys Met Thr lie Pro Asp Leu 

85 90 95 

Tyr Phe lie Asn Leu Ala Val Ala Asp Leu lie Leu Val Ala Asp Ser 
100 105 110 

Leu lie Glu Val Phe Asn Leu His Glu Arg Tyr Tyr Asp lie Ala Val 
10 115 120 125 

Leu Cys Thr Phe Met Ser Leu Phe Leu Gin Val Asn Met Tyr Ser Ser 
130 135 140 

Val Phe Phe Leu Thr Trp Met Ser Phe Asp Arg Tyr lie Ala Leu Ala 
145 150 155 160 

15 Arg Ala Met Arg Cys Ser Leu Phe Arg Thr Lys His His Ala Arg Leu 

165 170 175 

Ser Cys Gly Leu lie Trp Met Ala Ser Val Ser Ala Thr Leu Val Pro 
180 185 190 

Phe Thr Ala Val His Leu Gin His Thr Asp Glu Ala Cys Phe Cys Phe 
20 195 200 205 

Ala Asp Val Arg Glu Val Gin Trp Leu Glu Val Thr Leu Gly Phe lie 
210 215 220 

Val Pro Phe Ala lie lie Gly Leu Cys Tyr Ser Leu He Val Arg Val 
225 230 235 240 

25 Leu Val Arg Ala His Arg His Arg Gly Leu Arg Pro Arg Arg Gin Lys 

245 250 255 

Ala Lys Arg Met He Leu Ala Val Val Leu Val Phe Phe Val Cys Trp 
260 265 270 

Leu Pro Glu Asn Val Phe He Ser Val His Leu Leu Gin Arg Thr Gin 
30 275 280 285 

Pro Gly Ala Ala Pro Cys Lys Gin Ser Phe Arg His Ala His Pro Leu 
290 295 300 

Thr Gly His He Val Asn Leu Ala Ala Phe Ser Asn Ser Cys Leu Asn 
305 310 315 320 

35 Pro Leu He Tyr Ser Phe Leu Gly Glu Thr Phe Arg Asp Lys Leu Arg 

325 330 335 

Leu Tyr He Glu Gin Lys Thr Asn Leu Pro Ala Leu Asn Arg Phe Cys 
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340 345 350 

His Ala Ala Leu Lys Ala Val lie Pro Asp Ser Thr Glu Gin Ser Asp 
355 360 365 

Val Arg Phe Ser Ser Ala Val 
5 370 375 

(196) INFORMATION FOR SEQ ID NO: 195: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 960 base pairs 

(B) TYPE: nucleic acid 
10 (C) STRANDEDNESS : single 

( D ) TOPOLOGY : 1 inear 

(ii) MOLECULE TYPE: DNA (genomic) 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 195: 

ATGCCATTCC CAAACTGCTC AGCCCCCAGC ACTGTGGTGG CCACAGCTGT GGGTGTCTTG 6 0 

15 CTGGGG CTG G AGTGTGGGCT GGGTCTGCTG GGCAACGCGG TGGCGCTGTG GACCTTCCTG 12 0 

TTCCGGGTCA GGGTGTGGAA GCCGTACGCT GTCTACCTGC TCAACCTGGC CCTGGCTGAC 18 0 

CTGCTGTTGG CTGCGTGCCT GCCTTTCCTG GCCGCCTTCT ACCTGAGCCT CCAGGCTTGG 24 0 

CATCTGGGCC GTGTGGGCTG CTGGG CCCTG CGCTTCCTGC TGGACCTCAG CCGCAGCGTG 3 00 

GGGATGGCCT TCCTGGCCGC CGTGGCTTTG GACCGGTACC TCCGTGTGGT CCACCCTCGG 36 0 

20 CTTAAGGTCA ACCTGCTGTC TCCTCAGGCG GCCCTGGGGG TCTCGGGCCT CGTCTGGCTC 42 0 

CTGATGGTCG CCCTCACCTG CCCGGGCTTG CTCATCTCTG AGGCCGCCCA GAACTCCACC 48 0 

AGGTGCCACA GTTTCTACTC CAGGGCAGAC GGCTCCTTCA GCATCATCTG GCAGGAAGCA 54 0 

CTCTCCTGCC TTCAGTTTGT CCTCCCCTTT GGCCTCATCG TGTTCTGCAA TGCAGGCATC 6 00 

ATCAGGGCTC TCCAGAAAAG ACTCCGGGAG CCTGAGAAAC AGCCCAAGCT TCAGCGGGCC 66 0 

25 AAGGCACTGG TCACCTTGGT GGTGGTGCTG TTTGCTCTGT GCTTTCTGCC CTGCTTCCTG 72 0 

GCCAGAGTCC TGATGCACAT CTTC CAGAAT CTGGGGAGCT GCAGGGCCCT TTGTGCAGTG 78 0 

GCTCATACCT CGGATGTCAC GGGCAGCCTC ACCTACCTGC ACAGTGTCGT CAACCCCGTG 84 0 

GTATACTGCT TCTCCAGCCC CACCTTCAGG AGCTCCTATC GGAGGGTCTT CCACACCCTC 900 

CGAGGCAAAG GGCAGGCAGC AGAGCCCCCA GATTTCAACC CCAGAGACTC CTATTCCTGA 96 0 

30 (197) INFORMATION FOR SEQ ID NO: 196: 
(i) SEQUENCE CHARACTERISTICS: 
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(A) LENGTH: 319 amino acids 

(B) TYPE: amino acid 

( C ) STRANDEDNESS : 

(D) TOPOLOGY: not relevant 

(ii) MOLECULE TYPE: protein 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 196: 
Met Pro Phe Pro Asn Cys Ser Ala Pro Ser Thr Val Val Ala Thr Ala 

Val Gly Val Leu Leu Gly Leu Glu Cys Gly Leu Gly Leu Leu Gly Asn 
20 25 30 

Ala Val Ala Leu Trp Thr Phe Leu Phe Arg Val Arg Val Trp Lys Pro 
35 40 45 

Tyr Ala Val Tyr Leu Leu Asn Leu Ala Leu Ala Asp Leu Leu Leu Ala 
50 55 60 

Ala Cys Leu Pro Phe Leu Ala Ala Phe Tyr Leu Ser Leu Gin Ala Trp 
65 7 0 75 80 

His Leu Gly Arg Val Gly Cys Trp Ala Leu Arg Phe Leu Leu Asp Leu 
85 90 95 

Ser Arg Ser Val Gly Met Ala Phe Leu Ala Ala Val Ala Leu Asp Arg 
100 105 no 

Tyr Leu Arg Val Val His Pro Arg Leu Lys Val Asn Leu Leu Ser Pro 
115 120 125 

Gin Ala Ala Leu Gly Val Ser Gly Leu Val Trp Leu Leu Met Val Ala 
130 135 140 

Leu Thr Cys Pro Gly Leu Leu lie Ser Glu Ala Ala Gin Asn Ser Thr 
145 150 155 160 

Arg Cys His Ser Phe Tyr Ser Arg Ala Asp Gly Ser Phe Ser lie He 
165 170 175 

Trp Gin Glu Ala Leu Ser Cys Leu Gin Phe Val Leu Pro Phe Gly Leu 
180 185 190 

He Val Phe Cys Asn Ala Gly He He Arg Ala Leu Gin Lys Arg Leu 
195 200 205 

Arg Glu Pro Glu Lys Gin Pro Lys Leu Gin Arg Ala Lys Ala Leu Val 
210 215 220 

Thr Leu Val Val Val Leu Phe Ala Leu Cys Phe Leu Pro Cys Phe Leu 
225 230 235 240 

Ala Arg Val Leu Met His He Phe Gin Asn Leu Gly Ser Cys Arg Ala 
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245 250 255 

Leu Cys Ala Val Ala His Thr Ser Asp Val Thr Gly Ser Leu Thr Tyr 
260 265 270 

Leu His Ser Val Val Asn Pro Val Val Tyr Cys Phe Ser Ser Pro Thr 
5 275 280 285 

Phe Arg Ser Ser Tyr Arg Arg Val Phe His Thr Leu Arg Gly Lys Gly 
290 295 300 

Gin Ala Ala Glu Pro Pro Asp Phe Asn Pro Arg Asp Ser Tyr Ser 
305 310 315 

10 (198) INFORMATION FOR SEQ ID NO: 197: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 1143 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : single 



(ii) MOLECULE TYPE: DNA (genomic) 
(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 197: 

ATGGAGGAAG GTGGTGATTT TGACAACTAC TATGGGGCAG ACAACCAGTC TGAGTGTGAG 6 0 

TACACAGACT GGAAATCCTC GGGGGCCCTC ATCCCTGCCA TCTACATGTT GGTCTTCCTC 12 0 

20 CTGGGCACCA CGGGAAACGG TCTGGTGCTC TGGACCGTGT TTCGGAGCAG CCGGGAGAAG 18 0 

AGGCGCTCAG CTGATATCTT CATTGCTAGC CTGGCGGTGG CTGACCTGAC CTTCGTGGTG 24 0 

ACGCTGCCCC TGTGGGCTAC CTACACGTAC CGGGACTATG ACTGGCCCTT TGGGACCTTC 3 00 

TTCTGCAAGC TCAGCAGCTA CCTCATCTTC GTCAACATGT ACGCCAGCGT CTTCTGCCTC 36 0 

ACCGGCCTCA GCTTCGACCG CTACCTGGCC ATCGTGAGGC CAGTGGCCAA TGCTCGGCTG 42 0 

25 AGGCTGCGGG TCAGCGGGGC CGTGG CCACG GCAGTTCTTT GGGTGCTGGC CGCCCTCCTG 480 

GCCATGCCTG TCATGGTGTT ACGCACCACC GGGGACTTGG AGAACACCAC TAAGGTGCAG 540 

TGCTACATGG ACTACTCCAT GGTGG CCACT GTGAGCTCAG AGTGGGCCTG GGAGGTGGGC 6 00 

CTTGGGGTCT CGTCCACCAC CGTGGGCTTT GTGGTGCCCT TCACCATCAT GCTGACCTGT 6 60 

TACTTCTTCA TCGCCCAAAC CATCGCTGGC CACTTCCGCA AGG AACG CAT CGAGGGCCTG 72 0 

30 CGGAAGCGGC GCCGGCTTAA GAGCATCATC GTGGTG CTGG TGGTGACCTT TGCCCTGTGC 780 

TGGATGCCCT ACCACCTGGT GAAGACGCTG TACATGCTGG GCAGCCTGCT GCACTGGCCC 84 0 

TGTGACTTTG ACCTCTTCCT CATGAACATC TTCCCCTACT GCACCTGCAT CAGCTACGTC 900 
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AACAGCTGCC TCAACCCCTT CCTCTATGCC TTTTTCGACC CCCGCTTCCG CCAGGCCTGC 96 0 

ACCTCCATGC TCTGCTGTGG CCAGAGCAGG TGCGCAGGCA CCTCCCACAG C AG CAGTGGG 102 0 

GAGAAGTCAG CCAGCTACTC TTCGGGGCAC AGCCAGGGGC CCGGCCCCAA CATCGGCAAG 108 0 

GGTGGAGAAC AGATGCACGA GAAATCCATC CCCTACAGCC AGGAGACCCT TGTGGTTGAC 114 0 
5 TAG 

(199) INFORMATION FOR SEQ ID NO: 198: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 38 0 amino acids 

(B) TYPE : amino acid 
10 (C) STRANDEDNESS : 

(D) TOPOLOGY: not relevant 

(ii) MOLECULE TYPE: protein 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO:198: 

Met Glu Glu Gly Gly Asp Phe Asp Asn Tyr Tyr Gly Ala Asp Asn Gin 
15 1 5 io 15 

Ser Glu Cys Glu Tyr Thr Asp Trp Lys Ser Ser Gly Ala Leu He Pro 
20 25 30 

Ala He Tyr Met Leu Val Phe Leu Leu Gly Thr Thr Gly Asn Gly Leu 
35 40 45 



35 



Val Leu Trp Thr Val Phe Arg Ser Ser Arg Glu Lys Arg Arg Ser Ala 
50 55 60 

Asp lie Phe lie Ala Ser Leu Ala Val Ala Asp Leu Thr Phe Val Val 
65 70 75 eo 

Thr Leu Pro Leu Trp Ala Thr Tyr Thr Tyr Arg Asp Tyr Asp Trp Pro 
85 90 95 

Phe Gly Thr Phe Phe Cys Lys Leu Ser Ser Tyr Leu He Phe Val Asn 
100 105 no 

Met Tyr Ala Ser Val Phe Cys Leu Thr Gly Leu Ser Phe Asp Arg Tyr 
115 120 125 

Leu Ala He Val Arg Pro Val Ala Asn Ala Arg Leu Arg Leu Arg Val 
130 135 140 

Ser Gly Ala Val Ala Thr Ala Val Leu Trp Val Leu Ala Ala Leu Leu 
145 150 155 160 

Ala Met Pro Val Met Val Leu Arg Thr Thr Gly Asp Leu Glu Asn Thr 
165 170 175 
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Thr Lys Val Gin Cys Tyr Met Asp Tyr Ser Met Val Ala Thr Val Ser 
180 185 190 

Ser Glu Trp Ala Trp Glu Val Gly Leu Gly Val Ser Ser Thr Thr Val 
195 200 205 

5 Gly Phe Val Val Pro Phe Thr lie Met Leu Thr Cys Tyr Phe Phe lie 

210 215 220 

Ala Gin Thr lie Ala Gly His Phe Arg Lys Glu Arg lie Glu Gly Leu 
225 230 235 240 

Arg Lys Arg Arg Arg Leu Lys Ser lie lie Val Val Leu Val Val Thr 
10 245 250 255 

Phe Ala Leu Cys Trp Met Pro Tyr His Leu Val Lys Thr Leu Tyr Met 
260 265 270 

Leu Gly Ser Leu Leu His Trp Pro Cys Asp Phe Asp Leu Phe Leu Met 
275 280 285 

15 Asn lie Phe Pro Tyr Cys Thr Cys lie Ser Tyr Val Asn Ser Cys Leu 

290 295 300 

Asn Pro Phe Leu Tyr Ala Phe Phe Asp Pro Arg Phe Arg Gin Ala Cys 
305 310 315 320 

Thr Ser Met Leu Cys Cys Gly Gin Ser Arg Cys Ala Gly Thr Ser His 
20 325 330 335 

Ser Ser Ser Gly Glu Lys Ser Ala Ser Tyr Ser Ser Gly His Ser Gin 
340 345 350 

Gly Pro Gly Pro Asn Met Gly Lys Gly Gly Glu Gin Met His Glu Lys 
355 360 365 

25 Ser lie Pro Tyr Ser Gin Glu Thr Leu Val Val Asp 

370 375 380 

(200) INFORMATION FOR SEQ ID NO: 199: 

(i) SEQUENCE CHARACTERISTICS: 
(A) LENGTH: 1119 base pairs 

30 (B) TYPE: nucleic acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY : linear 

(ii) MOLECULE TYPE: DNA (genomic) 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 199: 
35 ATGAACT AC C CGCTAACGCT GGAAATGGAC CTCGAGAACC TGGAGGACCT GTTCTGGGAA 60 
CTGGACAGAT TGGACAACTA TAACGACACC TCCCTGGTGG AAAATCATCT CTGCCCTGCC 12 0 
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ACAGAGGGTC CCCTCATGGC CTCCTTCAAG GCCGTGTTCG TGCCCGTGGC CTACAGCCTC 18 0 

ATCTTCCTCC TGGGCGTGAT CGG CAACGTC CTGGTGCTGG TGATCC TGG A GCGGCACCGG 24 0 

CAGACACGCA GTTCCACGGA GACCTTCCTG TTCCACCTGG CCGTGGCCGA CCTCCTGCTG 3 00 

GTCTTCATCT TGCCCTTTGC CGTGGCCGAG GGCTCTGTGG GCTGGGTCCT GGGGACCTTC 3 60 

5 CTCTGCAAAA CTGTGATTGC CCTGCACAAA GTCAACTTCT ACTGCAGCAG CCTGCTCCTG 4 20 

GCCTGCATCG CCGTGGACCG CTACCTGGCC ATTGTCCACG CCGTCCATGC CTACCGCCAC 4 80 

CGCCGCCTCC TCTCCATCCA CATCACCTGT GGGACCATCT GGCTGGTGGG CTTCCTCCTT 54 0 

GCCTTGCCAG AGATTCTCTT CGCCAAAGTC AGCCAAGGCC ATCACAACAA CTCCCTGCCA 600 

CGTTGCACCT TCTCCCAAGA GAACCAAGCA GAAACGCATG CCTGGTTCAC CTCCCGATTC 66 0 

10 CTCTACCATG TGGCGGGATT CCTGCTGCCC ATGCTGGTGA TGGGCTGGTG CTACGTGGGG 72 0 

GTAGTGCACA GGTTGCGCCA GGCCCAGCGG CGCCCTCAGC GGCAGAAGGC AAAAAGGGTG 78 0 

GCCATCCTGG TG AC AAG CAT CTTCTTCCTC TGCTGGTCAC CCTACCACAT CGTCATCTTC 84 0 

CTGGACACCC TGGCGAGGCT GAAGGCCGTG GACAATACCT GCAAGCTGAA TGGCTCTCTC 900 

CCCGTGGCCA TCACCATGTG TGAGTTCCTG GGCCTGGCCC ACTGCTGCCT CAACCCCATG 96 0 

15 CTCTACACTT TCGCCGGCGT GAAGTTCCGC AGTGACCTGT CGCGGCTCCT GACCAAGCTG 102 0 

GGCTGTACCG GCCCTGCCTC CCTGTGCCAG CTCTTCCCTA GCTGGCGCAG GAGCAGTCTC 108 0 

TCTGAGTCAG AGAATGCCAC CTCTCTCACC ACGTTCTAG 1119 
(201) INFORMATION FOR SEQ ID NO: 2 00: 

(i) SEQUENCE CHARACTERISTICS: 
20 (A) LENGTH: 372 amino acids 

(B) TYPE: amino acid 

(C) STRANDEDNESS: 

(D) TOPOLOGY: not relevant 

(ii) MOLECULE TYPE: protein 

25 (xi) SEQUENCE DESCRIPTION: SEQ ID NO: 2 00: 

Met Asn Tyr Pro Leu Thr Leu Glu Met Asp Leu Glu Asn Leu Glu Asp 
15 10 15 

Leu Phe Trp Glu Leu Asp Arg Leu Asp Asn Tyr Asn Asp Thr Ser Leu 
20 25 30 



30 



Val Glu Asn His Leu Cys Pro Ala Thr Glu Gly Pro Leu Met Ala Ser 
35 40 45 

Phe Lys Ala Val Phe Val Pro Val Ala Tyr Ser Leu lie Phe Leu Leu 
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50 55 60 

Gly Val lie Gly Asn Val Leu Val Leu Val lie Leu Glu Arg His Arg 
65 70 75 80 

Gin Thr Arg Ser Ser Thr Glu Thr Phe Leu Phe His Leu Ala Val Ala 
5 85 90 95 

Asp Leu Leu Leu Val Phe lie Leu Pro Phe Ala Val Ala Glu Gly Ser 
100 105 110 

Val Gly Trp Val Leu Gly Thr Phe Leu Cys Lys Thr Val lie Ala Leu 
115 120 125 

10 His Lys Val Asn Phe Tyr Cys Ser Ser Leu Leu Leu Ala Cys He Ala 

130 135 140 

Val Asp Arg Tyr Leu Ala He Val His Ala Val His Ala Tyr Arg His 
145 150 155 160 

Arg Arg Leu Leu Ser He His He Thr Cys Gly Thr He Trp Leu Val 
15 165 170 175 

Gly Phe Leu Leu Ala Leu Pro Glu He Leu Phe Ala Lys Val Ser Gin 
180 185 190 

Gly His His Asn Asn Ser Leu Pro Arg Cys Thr Phe Ser Gin Glu Asn 
195 200 205 

20 Gin Ala Glu Thr His Ala Trp Phe Thr Ser Arg Phe Leu Tyr His Val 

210 215 220 

Ala Gly Phe Leu Leu Pro Met Leu Val Met Gly Trp Cys Tyr Val Gly 
225 230 235 240 

Val Val His Arg Leu Arg Gin Ala Gin Arg Arg Pro Gin Arg Gin Lys 
25 245 250 255 

Ala Lys Arg Val Ala He Leu Val Thr Ser He Phe Phe Leu Cys Trp 
260 265 270 

Ser Pro Tyr His He Val He Phe Leu Asp Thr Leu Ala Arg Leu Lys 
275 280 285 

30 Ala Val Asp Asn Thr Cys Lys Leu Asn Gly Ser Leu Pro Val Ala He 

290 295 300 

Thr Met Cys Glu Phe Leu Gly Leu Ala His Cys Cys Leu Asn Pro Met 
305 310 315 320 

Leu Tyr Thr Phe Ala Gly Val Lys Phe Arg Ser Asp Leu Ser Arg Leu 
35 325 330 335 

Leu Thr Lys Leu Gly Cys Thr Gly Pro Ala Ser Leu Cys Gin Leu Phe 
340 345 350 
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Pro Ser Trp Arg Arg Ser Ser Leu Ser Glu Ser Glu Asn Ala Thr Ser 
355 360 365 

Leu Thr Thr Phe 
370 

5 (202) INFORMATION FOR SEQ ID NO: 201: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 112 8 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : single 
10 (D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 







SEQUENCE DESCRIPTION: 


SEQ ID NO: 


201 : 








ATGG ATG TCJ A 


. CTTCCCAAGC 


1 CCGGGGCGTG 


\ GGCCTGGAGA TGTACCCAGG 


CACCGCGCAG 


60 




CCTGCGGCCC 


CCAACACCAC 


CTCCCCCGAG 


CTCAACCTGT 


1 CCCACCCGCT 


CCTGGGCACC 


120 


15 


GCCCTGGCCA 


ATGGGACAGG 


TGAGCTCTCG 


GAGCACCAGC 


AGTACGTGAT 


CGGCCTGTTC 


180 




CTCTCGTGCC 


TCTACACCAT 


CTTCCTCTTC 


CCCATCGGCT 


TTGTGGGCAA 


CATCCTGATC 


240 






AC AT C AG CTT 


CCGCGAGAAG 


ATGACCATCC 


CCGACCTGTA 


CTTCATCAAC 


300 






CGGAC CTCAT 


CCTGGTGGCC 




TTGAGGTGTT 


CAACCTGCAC 


360 




GAGCGGTAPT 


ACGACATCGC 


CGTCCTGTGC 


ACCTTCATGT 


CGCTCTTCCT 


GCAGGTCAAC 


420 


OA 
ZU 


ATGTACAGCA 


GCGTCTTCTT 


CCTCACCTGG 


ATGAGCTTCG 


ACCG CTACAT 


CGCCCTGGCC 


480 




AGGGCCATGC 


GCTGCAGCCT 


GTTCCGCACC 


AAGCACCACG 


CCCGGCTGAG 


CTGTGGCCTC 


540 




ATCTGGATGG 


CATCCGTGTC 


AGCCACGCTG 


GTGCCCTTCA 


CCGCCGTGCA 


CCTGCAGCAC 


600 




ACCGACGAGG 


CCTGCTTCTG 


TTTCGCGGAT 


GTCCGGGAGG 


TGCAGTGGCT 


CGAGGTCACG 


660 




CTGGGCTTCA 


TCGTGCCCTT 


CGCCATCATC 


GGCCTGTGCT 


ACTCCCTCAT 


TGTCCGGGTG 


720 


25 


CTGGTCAGGG 


CGCACCGGCA 


CCGTGGGCTG 


CGGCCCCGGC 


GGCAGAAGGC 


GAAGCGCATG 


780 




ATCCTCGCGG 


TGGTGCTGGT 


CTTCTTCGTC 


TGCTGGCTGC 


CGGAGAACGT 


CTTCATCAGC 


840 




GTGCACCTCC 


TGCAGCGGAC 


GCAGCCTGGG 


GCCGCTCCCT 


GCAAGCAGTC 


TTTCCGCCAT 


900 




GCCCACCCCC 


TCACGGGCCA 


CATTGTCAAC 


CTCACCGCCT 


TCTCCAACAG 


CTGCCTAAAC 


960 




CCCCTCATCT 


ACAGCTTTCT 


CGGGGAGACC 


TTCAGGGACA 


AGCTGAGGCT 


GTACATTGAG 


1020 


30 


CAGAAAACAA 


ATTTGCCGGC 


CCTGAACCGC 


TTCTGTCACG 


CTGCCCTGAA 


GGCCGTCATT 


1080 




CCAGACAGCA 


CCGAGCAGTC 


GGATGTGAGG 


TTCAG CAGTG 


CCGTGTAG 




1128 
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(203) INFORMATION FOR SEQ ID NO:202: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 375 amino acids 

(B) TYPE: amino acid 
5 (C) STRANDEDNESS : 

(D) TOPOLOGY: not relevant 

(ii) MOLECULE TYPE: protein 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO:202: 

Met Asp Val Thr Ser Gin Ala Arg Gly Val Gly Leu Glu Met Tyr Pro 
10 1 5 10 15 

Gly Thr Ala Gin Pro Ala Ala Pro Asn Thr Thr Ser Pro Glu Leu Asn 
20 25 30 

Leu Ser His Pro Leu Leu Gly Thr Ala Leu Ala Asn Gly Thr Gly Glu 
35 40 45 

15 Leu Ser Glu His Gin Gin Tyr Val lie Gly Leu Phe Leu Ser Cys Leu 

50 55 60 

Tyr Thr lie Phe Leu Phe Pro lie Gly Phe Val Gly Asn lie Leu lie 
65 70 75 80 

Leu Val Val Asn lie Ser Phe Arg Glu Lys Met Thr lie Pro Asp Leu 
20 85 90 95 

Tyr Phe lie Asn Leu Ala Val Ala Asp Leu lie Leu Val Ala Asp Ser 
100 105 110 

Leu lie Glu Val Phe Asn Leu His Glu Arg Tyr Tyr Asp lie Ala Val 
115 120 125 

25 Leu Cys Thr Phe Met Ser Leu Phe Leu Gin Val Asn Met Tyr Ser Ser 

130 135 140 

Val Phe Phe Leu Thr Trp Met Ser Phe Asp Arg Tyr lie Ala Leu Ala 
145 150 155 160 

Arg Ala Met Arg Cys Ser Leu Phe Arg Thr Lys His His Ala Arg Leu 
30 165 170 175 

Ser Cys Gly Leu lie Trp Met Ala Ser Val Ser Ala Thr Leu Val Pro 
180 185 190 

Phe Thr Ala Val His Leu Gin His Thr Asp Glu Ala Cys Phe Cys Phe 
195 200 205 

35 Ala Asp Val Arg Glu Val Gin Trp Leu Glu Val Thr Leu Gly Phe lie 

210 215 220 

Val Pro Phe Ala lie lie Gly Leu Cys Tyr Ser Leu lie Val Arg Val 
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225 230 235 240 

Leu Val Arg Ala His Arg His Arg Gly Leu Arg Pro Arg Arg Gin Lys 
245 250 255 

Ala Lys Arg Met lie Leu Ala Val Val Leu Val Phe Phe Val Cys Trp 
260 265 270 

Leu Pro Glu Asn Val Phe lie Ser Val His Leu Leu Gin Arg Thr Gin 
275 280 285 

Pro Gly Ala Ala Pro Cys Lys Gin Ser Phe Arg His Ala His Pro Leu 
290 295 300 

Thr Gly His lie Val Asn Leu Thr Ala Phe Ser Asn Ser Cys Leu Asn 
305 310 315 320 

Pro Leu lie Tyr Ser Phe Leu Gly Glu Thr Phe Arg Asp Lys Leu Arg 
325 330 335 

Leu Tyr lie Glu Gin Lys Thr Asn Leu Pro Ala Leu Asn Arg Phe Cys 
340 345 350 

His Ala Ala Leu Lys Ala Val lie Pro Asp Ser Thr Glu Gin Ser Asp 
355 360 365 

Val Arg Phe Ser Ser Ala Val 
370 375 

(204) INFORMATION FOR SEQ ID NO:203: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 113 7 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY : linear 

(ii) MOLECULE TYPE: DNA (genomic) 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO:203: 

ATGGACCTGG GGAAACCAAT GAAAAGCGTG CTGGTGGTGG CTCTCCTTGT CATTTTCCAG 60 

GTATG CCTGT GTCAAGATGA GGTCACGGAC GATTACATCG GAGACAACAC CACAGTGGAC 12 0 

TACACTTTGT TCGAGTCTTT GTGCTCCAAG AAGGACGTGC GGAACTTTAA AGCCTGGTTC 180 

CTCCCTATCA TGT ACTC CAT CATTTGTTTC GTGGGCCTAC TGGGCAATGG GCTGGTCGTG 24 0 

TTGACCTATA TCTATTTCAA GAGGCTCAAG ACCATGACCG ATACCTACCT GCTCAACCTG 3 00 

GCGGTGGCAG ACATCCTCTT CCTCCTGACC CTTCCCTTCT GGG CCTAC AG CGCGGCCAAG 360 

TCCTGGGTCT TCGGTGTCCA CTTTTGCAAG CTCATCTTTG CCATCTACAA GATGAGCTTC 42 0 
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TTCAGTGGCA TGCTCCTACT TCTTTGCATC AGCATTGACC GCTACGTGGC CATCGTCCAG 4 80 

GCTGTCTCAG CTCACCGCCA CCGTGCCCGC GTCCTTCTCA TCAGCAAGCT GTCCTGTGTG 54 0 

GGCATCTGGA TACTAGCCAC AGTGCTCTCC ATCCCAGAGC TCCTGTACAG TG AC CTCC AG 6 00 

AGGAGCAGCA GTGAGCAAGC GATGCGATGC TCTCTCATCA CAGAGCATGT GGAGGCCTTT 66 0 

5 ATCACCATCC AGGTGGCCCA GATGGTGATC GGCTTTCTGG TCCCCCTGCT GGCCATGAGC 72 0 

TTCTGTTACC TTGTCATCAT CCGCACCCTG CTCCAGGCAC GCAACTTTGA GCGCAACAAG 78 0 

GCCAAAAAGG TGATCATCGC TGTGGTCGTG G TCTTCATAG TCTTCCAGCT GCCCTACAAT 84 0 

GGGGTGGTCC TGGCCCAGAC GGTGGCCAAC TTCAACATCA CCAGTAGCAC CTGTGAGCTC 900 

AGTAAGCAAC TCAACATCGC CTACGACGTC ACCTACAGCC TGGCCTGCGT CCGCTGCTGC 96 0 

10 GTCAACCCTT TCTTGTACGC CTTCATCGGC GTCAAGTTCC GCAACGATCT CTTCAAGCTC 1020 

TTCAAGGACC TGGGCTGCCT CAGCCAGGAG CAGCTCCGGC AGTGGTCTTC CTGTCGGCAC 108 0 

ATCCGGCGCT C CTCC ATGAG TGTGGAGGCC GAGACCACCA CCACCTTCTC CCCATAG 113 7 
(205) INFORMATION FOR SEQ ID NO:204: 

(i) SEQUENCE CHARACTERISTICS: 
15 (a) LENGTH: 378 amino acids 

(B) TYPE: amino acid 

(C) STRANDEDNESS : 

(D) TOPOLOGY: not relevant 

(ii) MOLECULE TYPE: protein 

20 (xi) SEQUENCE DESCRIPTION: SEQ ID NO:204: 

Met Asp Leu Gly Lys Pro Met Lys Ser Val Leu Val Val Ala Leu Leu 
15 10 15 

Val lie Phe Gin Val Cys Leu Cys Gin Asp Glu Val Thr Asp Asp Tyr 
20 25 30 

25 He Gly Asp Asn Thr Thr Val Asp Tyr Thr Leu Phe Glu Ser Leu Cys 

35 40 45 

Ser Lys Lys Asp Val Arg Asn Phe Lys Ala Trp Phe Leu Pro He Met 
50 55 60 

Tyr Ser He He Cys Phe Val Gly Leu Leu Gly Asn Gly Leu Val Val 
30 65 70 75 80 

Leu Thr Tyr He Tyr Phe Lys Arg Leu Lys Thr Met Thr Asp Thr Tyr 
85 90 95 

Leu Leu Asn Leu Ala Val Ala Asp He Leu Phe Leu Leu Thr Leu Pro 
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100 105 HO 

Phe Trp Ala Tyr Ser Ala Ala Lys Ser Trp Val Phe Gly Val His Phe 
115 120 125 

Cys Lys Leu lie Phe Ala lie Tyr Lys Met Ser Phe Phe Ser Gly Met 
5 130 135 140 

Leu Leu Leu Leu Cys lie Ser lie Asp Arg Tyr Val Ala lie Val Gin 
145 150 155 160 

Ala Val Ser Ala His Arg His Arg Ala Arg Val Leu Leu lie Ser Lys 
165 170 175 

10 Leu Ser Cys Val Gly lie Trp lie Leu Ala Thr Val Leu Ser lie Pro 

180 185 190 

Glu Leu Leu Tyr Ser Asp Leu Gin Arg Ser Ser Ser Glu Gin Ala Met 
195 200 205 

Arg Cys Ser Leu lie Thr Glu His Val Glu Ala Phe lie Thr He Gin 
15 210 215 220 

Val Ala Gin Met Val He Gly Phe Leu Val Pro Leu Leu Ala Met Ser 
225 230 235 240 

Phe Cys Tyr Leu Val lie He Arg Thr Leu Leu Gin Ala Arg Asn Phe 
245 250 255 

20 Glu Arg Asn Lys Ala Lys Lys Val He He Ala Val Val Val Val Phe 

260 265 270 

He Val Phe Gin Leu Pro Tyr Asn Gly Val Val Leu Ala Gin Thr Val 
275 280 285 

Ala Asn Phe Asn He Thr Ser Ser Thr Cys Glu Leu Ser Lys Gin Leu 
25 290 295 300 

Asn lie Ala Tyr Asp Val Thr Tyr Ser Leu Ala Cys Val Arg Cys Cys 
305 310 315 320 

Val Asn Pro Phe Leu Tyr Ala Phe He Gly Val Lys Phe Arg Asn Asp 
325 330 335 

30 Leu Phe Lys Leu Phe Lys Asp Leu Gly Cys Leu Ser Gin Glu Gin Leu 

340 345 350 

Arg Gin Trp Ser Ser Cys Arg His He Arg Arg Ser Ser Met Ser Val 
355 360 365 

Glu Ala Glu Thr Thr Thr Thr Phe Ser Pro 
35 370 375 

(206) INFORMATION FOR SEQ ID NO: 205: 
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(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 1086 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : single 
5 (D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO:205: 
ATGGATATAC AAATGGCAAA CAATTTTACT CCGCCCTCTG CAACTCCTCA GGGAAATGAC 6 0 

TGTGACCTCT ATGCACATCA CAGCACGGCC AGGATAGTAA TGCCTCTGCA TTACAGCCTC 120 

10 GTCTTCATCA TTGGGCTCGT GGGAAACTTA CTAGCCTTGG TCGTCATTGT TCAAAACAGG 18 0 

AAAAAAATCA ACTCTACCAC CCTCTATTCA ACAAATTTGG TGATTTCTGA TATACTTTTT 24 0 

ACCACGGCTT TGCCTACACG AATAGCCTAC TATGCAATGG GCTTTGACTG GAGAATCGGA 3 00 

GATGCCTTGT GTAGGATAAC TGCGCTAGTG TTTTACATCA ACACATATGC AGGTGTGAAC 3 60 

TTTATGACCT GCCTGAGTAT TGACCGCTTC ATTGCTGTGG TGCACCCTCT ACGCTACAAC 42 0 

15 AAGATAAAAA GGATTGAACA TGCAAAAGGC GTGTGCATAT TTGTCTGGAT TCTAGTATTT 4 80 

GCTCAGACAC TCCCACTCCT CATCAACCCT ATGTCAAAGC AGGAGGCTGA AAGGATTACA 54 0 

TGCATGGAGT ATCCAAACTT TGAAGAAACT AAATCTCTTC CCTGGATTCT GCTTGGGGCA 6 00 

TGTTTCATAG GATATGTACT TCCACTTATA ATCATTCTCA TCTG CTATTC TCAGATCTGC 66 0 

TGCAAACTCT TCAGAACTGC CAAACAAAAC CCACTCACTG AGAAATCTGG TGTAAACAAA 72 0 

20 AAGGCTAAAA ACACAATTAT TCTTATTATT GTTGTGTTTG TTCTCTGTTT CACACCTTAC 78 0 

CATGTTGCAA TTATTCAACA TATGATTAAG AAGCTTCGTT TCTCTAATTT CCTGGAATGT 84 0 

AGCCAAAGAC ATTCGTTCCA GATTTCTCTG CACTTTACAG TATGCCTGAT GAACTTCAAT 900 

TGCTGCATGG ACCCTTTTAT CTACTTCTTT GCATGTAAAG GGTATAAGAG AAAGGTTATG 96 0 

AGGATGCTGA AACGGCAAGT CAGTGTATCG ATTTCTAGTG CTGTGAAGTC AGCCCCTGAA 102 0 

25 GAAAATTCAC GTGAAATGAC AGAAACGCAG ATGATGATAC ATTCCAAGTC TTCAAATGGA 10 8 0 

AAGTGA 1086 

(207) INFORMATION FOR SEQ ID NO:206: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 361 amino acids 
30 (B) TYPE: amino acid 

(C) STRANDEDNESS: 

(D) TOPOLOGY: not relevant 



BNSDOCID: <WO 0022129A1 J_> 



WO 00/22129 PCT/US99/23938 

162 

(ii) MOLECULE TYPE: protein 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 2 06: 

Met Asp lie Gin Met Ala Asn Asn Phc Thr Pro Pro Ser Ala Thr Pro 
15 10 15 

5 Gin Gly Asn Asp Cys Asp Leu Tyr Ala His His Ser Thr Ala Arg lie 

20 25 30 

Val Met Pro Leu His Tyr Ser Leu Val Phe He He Gly Leu Val Gly 
35 40 45 

Asn Leu Leu Ala Leu Val Val He Val Gin Asn Arg Lys Lys He Asn 
10 50 55 60 

Ser Thr Thr Leu Tyr Ser Thr Asn Leu Val He Ser Asp He Leu Phe 
65 70 75 80 

Thr Thr Ala Leu Pro Thr Arg He Ala Tyr Tyr Ala Met Gly Phe Asp 
85 90 95 

15 Trp Arg He Gly Asp Ala Leu Cys Arg He Thr Ala Leu Val Phe Tyr 

100 105 110 

He Asn Thr Tyr Ala Gly Val Asn Phe Met Thr Cys Leu Ser He Asp 
115 120 125 

Arg Phe He Ala Val Val His Pro Leu Arg Tyr Asn Lys He Lys Arg 
20 130 135 140 

He Glu His Ala Lys Gly Val Cys He Phe Val Trp He Leu Val Phe 
145 150 155 160 

Ala Gin Thr Leu Pro Leu Leu He Asn Pro Met Ser Lys Gin Glu Ala 
165 170 175 

25 Glu Arg He Thr Cys Met Glu Tyr Pro Asn Phe Glu Glu Thr Lys Ser 

180 185 190 

Leu Pro Trp He Leu Leu Gly Ala Cys Phe He Gly Tyr Val Leu Pro 
195 200 205 

Leu He He He Leu He Cys Tyr Ser Gin He Cys Cys Lys Leu Phe 
30 210 215 220 

Arg Thr Ala Lys Gin Asn Pro Leu Thr Glu Lys Ser Gly Val Asn Lys 
225 230 235 240 

Lys Ala Lys Asn Thr He He Leu He He Val Val Phe Val Leu Cys 
245 250 255 

35 Phe Thr Pro Tyr His Val Ala He He Gin His Met He Lys Lys Leu 

260 265 270 
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Arg Phe Ser Asn Phe Leu 
275 



Glu Cys Ser Gin Arg His Ser Phe Gin lie 
280 285 



Ser Leu His Phe Thr Val 
290 



Cys Leu Met Asn Phe Asn Cys Cys Met Asp 
295 300 



Pro Phe lie Tyr Phe Phe 
305 310 



Ala Cys Lys Gly Tyr Lys Arg Lys Val Met 
315 320 



Arg Met Leu Lys Arg Gin 
325 



Val Ser Val Ser lie Ser Ser Ala Val Lys 
330 335 



Ser Ala Pro Glu Glu Asn 
340 



Ser Arg Glu Met Thr Glu Thr Gin Met Met 
345 350 



lie His Ser Lys Ser Ser 
355 



Asn Gly Lys 
360 



(208) INFORMATION FOR SEQ ID NO: 207: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 1446 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : single 

( D ) TOPOLOGY : 1 inear 

(ii) MOLECULE TYPE: DNA (genomic) 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 207: 

ATGCGGTGGC TGTGGCCCCT GGCTGTCTCT CTTGCTGTGA TTTTGGCTGT GGGGCTAAGC 6 0 

AGGGTCTCTG GGGGTGCCCC CCTGCACCTG GGCAGGCACA GAGCCGAGAC CCAGGAGCAG 12 0 

CAGAGCCGAT CCAAGAGGGG CACCGAGGAT GAGGAGGCCA AGGGCGTGCA GCAGTATGTG 18 0 

CCTGAGGAGT GGGCGGAGTA CCCCCGGCCC ATTCACCCTG CTGGCCTGCA GCCAACCAAG 24 0 

CCCTTGGTGG CCACCAGCCC TAACCCCGAC AAGGATGGGG GCACCCCAGA CAGTGGGCAG 3 00 

GAACTGAGGG GCAATCTGAC AGGGGCACCA GGGCAGAGGC TACAGATCCA GAACCCCCTG 36 0 

TATCCGGTGA CCGAGAGCTC CTACAGTGCC TATGCCATCA TGCTTCTGGC GCTGGTGGTG 42 0 

TTTGCGGTGG GCATTGTGGG CAACCTGTCG GTCATGTGCA TCGTGTGGCA CAGCTACTAC 48 0 

CTGAAGAGCG CCTGGAACTC CATCCTTGCC AGCCTGGCCC TCTGGGATTT TCTGGTCCTC 54 0 

TTTTTCTGCC TCCCTATTGT CATCTTCAAC GAGATCACCA AGCAGAGGCT ACTGGGTGAC 600 

GTTTCTTGTC GTGCCGTGCC CTTCATGGAG GTCTCCTCTC TGGGAGTCAC GACTTTCAGC 660 

CTCTGTGCCC TGGGCATTGA CCGCTTCCAC GTGGCCACCA GCACCCTGCC CAAGGTGAGG . 72 0 

CCCATCGAGC GGTGCCAATC CATCCTGGCC AAGTTGGCTG TCATCTGGGT GGGCTCCATG 78 0 
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ACGCTGGCTG TGCCTGAGCT CCTGCTGTGG CAGCTGGCAC AGGAGCCTGC CCCCACCATG 84 0 

GGCACCCTGG ACTCATG CAT CATGAAACCC TCAGCCAGCC TGCCCGAGTC CCTGTATTCA 90 0 

CTGGTGATGA CCTACCAGAA CGCCCGCATG TGGTGGTACT TTGGCTGCTA CTTCTGCCTG 96 0 

CCCATCCTCT TCACAGTCAC CTGCCAGCTG GTGACATGGC GGGTGCGAGG CCCTCCAGGG 102 0 

AGGAAGTCAG AGTGCAGGGC CAGCAAGCAC GAGCAGTGTG AGAGCCAGCT CAAGAGCACC 10 8 0 

GTGGTGGGCC TGAC CGTGGT CTACGCCTTC TGCACCCTCC CAGAGAACGT CTGCAACATC 114 0 

GTGGTGGCCT ACCTCTCCAC CGAGCTGACC CGCCAGACCC TGGACCTCCT GGGCCTCATC 12 00 

AACCAGTTCT CCACCTTCTT CAAGGGCGCC ATCACCCCAG TGCTGCTCCT TTGCATCTGC 126 0 

AGGCCGCTGG GCCAGGCCTT CCTGGACTGC TGCTGCTGCT GCTGCTGTGA GGAGTGCGGC 132 0 

GGGGCTTCGG AGGCCTCTGC TGCCAATGGG TCGGACAACA AGCTCAAGAC CGAGGTGTCC 13 8 0 

TCTTCCATCT ACTTCCACAA GCCCAGGGAG TCACCCCCAC TCCTGCCCCT GGGCACACCT 144 0 

TGCTGA 1446 
(209) INFORMATION FOR SEQ ID NO:208: 

(i) SEQUENCE CHARACTERISTICS: 
15 (A) LENGTH: 4 81 amino acids 

(B) TYPE: amino acid 

(C) STRANDEDNESS : 

(D) TOPOLOGY: not relevant 



10 



20 



25 



(ii) MOLECULE TYPE: protein 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 2 08: 
Met Arg Trp Leu Trp Pro Leu Ala Val Ser Leu Ala Val He Leu Ala 

Val Gly Leu Ser Arg Val Ser Gly Gly Ala Pro Leu His Leu Gly Arg 
20 25 30 

His Arg Ala Glu Thr Gin Glu Gin Gin Ser Arg Ser Lys Arg Gly Thr 
35 40 45 

Glu Asp Glu Glu Ala Lys Gly Val Gin Gin Tyr Val Pro Glu Glu Trp 
50 55 60 

Ala Glu Tyr Pro Arg Pro He His Pro Ala Gly Leu Gin Pro Thr Lys 
30 65 70 75 so 

Pro Leu Val Ala Thr Ser Pro Asn Pro Asp Lys Asp Gly Gly Thr Pro 
85 90 95 

Asp Ser Gly Gin Glu Leu Arg Gly Asn Leu Thr Gly Ala Pro Gly Gin 
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Arg Leu Gin lie Gin 
115 

Ser Ala Tyr Ala lie 
130 

lie Val Gly Asn Leu 
145 

Leu Lys Ser Ala Trp 
165 

Phe Leu Val Leu Phe 
180 

Thr Lys Gin Arg Leu 
195 

Met Glu Val Ser Ser 
210 

Gly lie Asp Arg Phe 
225 

Pro lie Glu Arg Cys 
245 

Val Gly Ser Met Thr 
260 

Ala Gin Glu Pro Ala 
275 

Lys Pro Ser Ala Ser 
290 

Tyr Gin Asn Ala Arg 
305 

Pro lie Leu Phe Thr 
325 

Gly Pro Pro Gly Arg 
340 

Cys Glu Ser Gin Leu 
355 

Ala Phe Cys Thr Leu 
370 



165 

105 

Asn Pro Leu Tyr Pro Val 
120 

Met Leu Leu Ala Leu Val 
135 

Ser Val Met Cys He Val 
150 155 

Asn Ser He Leu Ala Ser 
170 

Phe Cys Leu Pro He Val 
185 

Leu Gly Asp Val Ser Cys 
200 

Leu Gly Val Thr Thr Phe 
215 

His Val Ala Thr Ser Thr 
230 235 

Gin Ser He Leu Ala Lys 
250 

Leu Ala Val Pro Glu Leu 
265 

Pro Thr Met Gly Thr Leu 
280 

Leu Pro Glu Ser Leu Tyr 
295 

Met Trp Trp Tyr Phe Gly 
310 315 

Val Thr Cys Gin Leu Val 
330 

Lys Ser Glu Cys Arg Ala 
345 

Lys Ser Thr Val Val Gly 
360 

Pro Glu Asn Val Cys Asn 
375 
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Thr Glu Ser Ser Tyr 
125 

Val Phe Ala Val Gly 
140 

Trp His Ser Tyr Tyr 
160 

Leu Ala Leu Trp Asp 
175 

He Phe Asn Glu He 
190 

Arg Ala Val Pro Phe 
205 

Ser Leu Cys Ala Leu 

220 

Leu Pro Lys Val Arg 
240 

Leu Ala Val He Trp 
255 

Leu Leu Trp Gin Leu 
270 

Asp Ser Cys He Met 
285 

Ser Leu Val Met Thr 
300 

Cys Tyr Phe Cys Leu 
320 

Thr Trp' Arg Val Arg 
335 

Ser Lys His Glu Gin 
350 

Leu Thr Val Val Tyr 
365 

He Val Val Ala Tyr 
380 



Leu Ser Thr Glu Leu Thr Arg Gin Thr Leu Asp Leu Leu Gly Leu He 
385 390 395 400 
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Asn Gin Phe Ser Thr Phe Phe Lys Gly Ala lie Thr Pro Val Leu Leu 
405 410 415 

Leu Cys lie Cys Arg Pro Leu Gly Gin Ala Phe Leu Asp Cys Cys Cys 
420 425 430 

5 Cys Cys Cys Cys Glu Glu Cys Gly Gly Ala Ser Glu Ala Ser Ala Ala 

435 440 445 

Asn Gly Ser Asp Asn Lys Leu Lys Thr Glu Val Ser Ser Ser He Tyr 
450 455 460 

Phe His Lys Pro Arg Glu Ser Pro Pro Leu Leu Pro Leu Gly Thr Pro 
10 465 470 475 480 

Cys 

(210) INFORMATION FOR SEQ ID NO: 2 09: 

(i) SEQUENCE CHARACTERISTICS: 
15 (A) LENGTH: 1101 base pairs 

<B) TYPE: nucleic acid 
(C) STRANDEDNESS : single 
{ D ) TOPOLOGY : 1 inear 

(ii) MOLECULE TYPE: DNA (genomic) 

20 (xi) SEQUENCE DESCRIPTION: SEQ ID NO: 2 09: 

ATGTGGAACG CGACGCCCAG CGAAGAGCCG GGGTTCAACC TCACACTGGC CGACCTGGAC 60 

TGGGATGCTT CCCCCGGCAA CGACTCGCTG GGCGACGAGC TGCTGCAGCT CTTCCCCGCG 12 0 

CCGCTGCTGG CGGGCGTCAC AGCCACCTGC GTGGCACTCT TCGTGGTGGG TATCGCTGGC 18 0 

AACCTGCTCA CCATGCTGGT GGTG TCGCGC TTCCGCGAGC TGCGCACCAC CACCAACCTC 240 

25 TACCTGTCCA GCATGGCCTT CTCCGATCTG CTCATCTTCC TCTGCATGCC CCTGGACCTC 3 00 

GTTCGCCTCT GGCAGTACCG GCCCTGGAAC TTCGGCGACC TCCTCTGCAA ACTCTTCCAA 360 

TTCGTCAGTG AG AG CTGCAC CTACGCCACG GTGCTCACCA TCACAGCGCT GAGCGTCGAG 42 0 

CGCTACTTCG CCATCTGCTT CCCACTCCGG GCCAAGGTGG TGGTCACCAA GGGGCGGGTG 480 

AAGCTGGTCA TCTTCGTCAT CTGGGCCGTG GCCTTCTGCA GCGCCGGGCC CATCTTCGTG 54 0 

30 CTAGTCGGGG TGGAGCACGA GAACGGCACC GACCCTTGGG ACACCAACGA GTGCCGCCCC 600 

ACCGAGTTTG CGGTGCGCTC TGGACTG CTC ACGGTCATGG TGTGGGTGTC CAGCATCTTC 66 0 

TTCTTCCTTC CTGTCTTCTG TCTCACGGTC CTCTACAGTC TCATCGGCAG GAAGCTGTGG 72 0 

CGGAGGAGGC GCGGCGATGC TGTCGTGGGT GCCTCGCTCA GGGACCAGAA CCACAAGCAA 780 
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ACCAAGAAAA TGCTGGCTGT AGTGGTGTTT GCCTTCATCC TCTGCTGGCT CCCCTTCCAC 84 0 

GTAGGGCGAT ATTTATTTTC CAAATCCTTT GAGCCTGGCT CCTTGGAGAT TGCTCAGATC 9 00 

AGCCAGTACT GCAACCTCGT GTCCTTTGTC CTCTTCTACC TCAGTGCTGC CATCAACCCC 96 0 

ATTCTGTACA ACATCATGTC CAAGAAGTAC CGGGTGGCAG TGTTCAGACT TCTGGGATTC 102 0 

5 GAACCCTTCT CCCAGAGAAA GCTCTCCACT CTGAAAGATG AAAGTTCTCG GGCCTGGACA 1080 

GAATCTAGTA TTAATACATG A 1101 
(211) INFORMATION FOR SEQ ID NO:210: 

( i ) SEQUENCE CHARACTERISTICS : 
(A) LENGTH: 366 amino acids 

10 (B) TYPE: amino acid 

(C) . STRANDEDNESS : 

(D) TOPOLOGY: not relevant 

(ii) MOLECULE TYPE: protein 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 210: 

15 Met Trp Asn Ala Thr Pro Ser Glu Glu Pro Gly Phe Asn Leu Thr Leu 

! 5 10 15 

Ala Asp Leu Asp Trp Asp Ala Ser Pro Gly Asn Asp Ser Leu Gly Asp 
20 . 25 30 

Glu Leu Leu Gin Leu Phe Pro Ala Pro Leu Leu Ala Gly Val Thr Ala 
20 35 40 45 

Thr Cys Val Ala Leu Phe Val Val Gly lie Ala Gly Asn Leu Leu Thr 
50 55 60 

Met Leu Val Val Ser Arg Phe Arg Glu Leu Arg Thr Thr Thr Asn Leu 
65 70 75 80 

25 Tyr Leu Ser Ser Met Ala Phe Ser Asp Leu Leu lie Phe Leu Cys Met 

85 90 95 

Pro Leu Asp Leu Val Arg Leu Trp Gin Tyr Arg Pro Trp Asn Phe Gly 
100 105 110 

Asp Leu Leu Cys Lys Leu Phe Gin Phe Val Ser Glu Ser Cys Thr Tyr 
30 115 120 125 

Ala Thr Val Leu Thr He Thr Ala Leu Ser Val Glu Arg Tyr Phe Ala 
130 135 140 

He Cys Phe Pro Leu Arg Ala Lys Val Val Val Thr Lys Gly Arg Val 
145 150 , 155 160 

35 Lys Leu Val He Phe Val He Trp Ala Val Ala Phe Cys Ser Ala Gly 
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165 



170 



175 



Pro lie Phe Val Leu Val Gly Val Glu His Glu Asn Gly Thr Asp Pro 
180 185 190 

Trp Asp Thr Asn Glu Cys Arg Pro Thr Glu Phe Ala Val Arg Ser Gly 
5 195 200 205 

Leu Leu Thr Val Met Val Trp Val Ser Ser lie Phe Phe Phe Leu Pro 
210 215 220 

Val Phe Cys Leu Thr Val Leu Tyr Ser Leu lie Gly Arg Lys Leu Trp 
225 230 235 240 

10 Arg Arg Arg Arg Gly Asp Ala Val Val Gly Ala Ser Leu Arg Asp Gin 

245 250 255 

Asn His Lys Gin Thr Lys Lys Met Leu Ala Val Val Val Phe Ala Phe 
260 265 270 

lie Leu Cys Trp Leu Pro Phe His Val Gly Arg Tyr Leu Phe Ser Lys 
15 275 280 285 

Ser Phe Glu Pro Gly Ser Leu Glu lie Ala Gin lie Ser Gin Tyr Cys 
290 295 300 

Asn Leu Val Ser Phe Val Leu Phe Tyr Leu Ser Ala Ala lie Asn Pro 
305 310 315 320 

20 He Leu Tyr Asn He Met Ser Lys Lys Tyr Arg Val Ala Val Phe Arg 

325 330 335 

Leu Leu Gly Phe Glu Pro Phe Ser Gin Arg Lys Leu Ser Thr Leu Lys 
340 345 350 

Asp Glu Ser Ser Arg Ala Trp Thr Glu Ser Ser lie Asn Thr 
25 355 360 365 



(212) INFORMATION FOR SEQ ID NO:211: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 1842 base pairs 

(B) TYPE: nucleic acid 
30 (C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 211: 



ATGCGAGCCC CGGGCGCGCT TCTCGCCCGC ATGTCGCGGC TACTGCTTCT GCTACTGCTC 6 0 

35 AAGGTGTCTG CCTCTTCTGC CCTCGGGGTC GCCCCTGCGT C C AGAAACG A AACTTGTCTG 12 0 
GGGGAGAGCT GTGCACCTAC AGTGATCCAG CGCCGCGGCA GGGACGCCTG GGGACCGGGA 18 0 
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AATTCTGCAA 


GAGACGTTCT 


GCGAGCCCGA 


GCACCCAGGG 


AGGAGCAGGG 


GGCAGCGTTT 


240 




CTTGCGGGAC 


CCTCCTGGGA 


CCTGCCGGCG 


GCCCCGGGCC 


GTGACCCGGC 


TGCAGGCAGA 


300 




GGGGCGGAGG 


CGTCGGCAGC 


CGGACCCCCG 


GGACCTCCAA 


CCAGGCCACC 


TGGCCCCTGG 


360 




AGGTGGAAAG 


GTGCTCGGGG 


TCAGGAGCCT 


TCTGAAACTT 


TGGGGAGAGG 


GAACCCCACG 


420 


5 


GCCCTCCAGC 


TCTTCCTTCA 


GAT CTCAGAG 


GAGGAAGAGA 


AGGGTCCCAG 


AGGCGCTGGC 


480 




ATTTCCGGGC 


GTAGCCAGGA 


GCAGAGTGTG 


AAGACAGTCC 


CCGGAGCCAG 


CGATCTTTTT 


540 




TACTGGCCAA 


GGAGAGCCGG 


GAAACTCCAG 


GGTTCCCACC 


ACAAGCCCCT 


GTCCAAGACG 


600 




GCCAATGGAC 


TGGCGGGGCA 


CGAAGGGTGG 


ACAATTGCAC 


TCCCGGGCCG 


GGCGCTGGCC 


660 




CAGAATGGAT 


CCTTGGGTGA 


AGGAATCCAT 


GAGCCTGGGG 


GTCCCCGCCG 


GGGAAACAGC 


720 


10 


ACGAACCGGC 


GTGTGAGACT 


GAAGAACCCC 


TTCTACCCGC 


TGACCCAGGA 


GTCCTATGGA 


780 




GCCTACGCGG 


TCATGTGTCT 


GTCCGTGGTG 


ATCTTCGGGA 


CCGGCATCAT 


TGGCAACCTG 


840 




GCGGTGATGT 


G CAT CGTGTG 


CCACAACTAC 


TACATGCGGA 


GCATCTCCAA 


CTCCCTCTTG 


900 




GCCAACCTGG 


CCTTCTGGGA 


CTTTCTCATC 


ATCTTCTTCT 


GCCTTCCGCT 


GGTCATCTTC 


960 




CACGAGCTGA 


CCAAGAAGTG 


GCTG CTGGAG 


GACTTCTCCT 


GCAAGATCGT 


GCC CTATAT A 


1020 


15 


GAGGTCGCCT 


CTCTGGGAGT 


CACCACTTTC 


ACCTTATGTG 


CTCTGTGCAT 


AGACCGCTTC 


1080 




CGTGCTGCCA 


CCAACGTACA 


GATGTACTAC 


GAAATGATCG 


AAAATTGTTC 


CTCAACAACT 


1140 




GCCAAACTTG 


CTGTTATATG 


GGTGGGAGCT 


CTATTGTTAG 


CACTTCCAGA 


AGTTGTTCTC 


1200 




CGCCAGCTGA 


GCAAGGAGGA 


TTTGGGGTTT 


AGTGGCCGAG 


CTCCGGCAGA 


AAGGTGCATT 


1260 




ATTAAGATCT 


CTCCTGATTT 


ACCAGACACC 


ATCTATGTTC 


TAGCCCTCAC 


CTACGACAGT 


1320 


20 


GCGAGACTGT 


GGTGGTATTT 


TGGCTGTTAC 


TTTTGTTTGC 


CCACGCTTTT 


CACCATCACC 


1380 




TGCTCTCTAG 


TGACTGCGAG 


GAAAATCCGC 


AAAGCAGAGA 


AAGCCTGTAC 


CCGAGGGAAT 


1440 




AAACGGCAGA 


TTCAACTAGA 


GAGTCAGATG 


AAGTGTACAG 


TAGTGGCACT 


GAC CATTTT A 


1500 




TATGGATTTT 


GCATTATTCC 


TGAAAATATC 


TGCAACATTG 


TTACTGCCTA 


CATGGCTACA 


1560 




GGGGTTTCAC 


AGCAGACAAT 


GGACCTCCTT 


AATATCATCA 


GCCAGTTCCT 


TTTGTTCTTT 


1620 


25 


AAGTCCTGTG 


TCACCCCAGT 


CCTCCTTTTC 


TGTCTCTGCA 


AACCCTTCAG 


TCGGGCCTTC 


1680 




ATGGAGTGCT 


GCTGCTGTTG 


CTGTGAGGAA 


TGCATTCAGA 


AGTCTTCAAC 


GGTGACCAGT 


1740 




GATGACAATG 


ACAACGAGTA 


C AC C ACGGAA 


CTCGAACTCT 


CGCCTTTCAG 


TACCATACGC 


1800 




CGTGAAATGT 


CCACTTTTGC 


TTCTGTCGGA 


ACTCATTGCT 


GA 




1842 
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(213) INFORMATION FOR SEQ ID NO: 212: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 613 amino acids 

(B) TYPE: amino acid 

(C) STRANDEDNESS : 

(D) TOPOLOGY: not relevant 

(ii) MOLECULE TYPE: protein 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 212: 

Met Arg Ala Pro Gly Ala Leu Leu Ala Arg Met Ser Arg Leu Leu Leu 
1 5 10 15 

Leu Leu Leu Leu Lys Val Ser Ala Ser Ser Ala Leu Gly Val Ala Pro 
20 25 30 

Ala Ser Arg Asn Glu Thr Cys Leu Gly Glu Ser Cys Ala Pro Thr Val 
35 40 45 

He Gin Arg Arg Gly Arg Asp Ala Trp Gly Pro Gly Asn Ser Ala Arg 
50 55 60 

Asp Val Leu Arg Ala Arg Ala Pro Arg Glu Glu Gin Gly Ala Ala Phe 
65 70 75 80 

Leu Ala Gly Pro Ser Trp Asp Leu Pro Ala Ala Pro Gly Arg Asp Pro 
85 90 95 

Ala Ala Gly Arg Gly Ala Glu Ala Ser Ala Ala Gly Pro Pro Gly Pro 
100 105 no 

Pro Thr Arg Pro Pro Gly Pro Trp Arg Trp Lys Gly Ala Arg Gly Gin 
115 120 125 

Glu Pro Ser Glu Thr Leu Gly Arg Gly Asn Pro Thr Ala Leu Gin Leu 
130 135 140 

Phe Leu Gin He Ser Glu Glu Glu Glu Lys Gly Pro Arg Gly Ala Gly 
14 5 150 155 160 

He Ser Gly Arg Ser Gin Glu Gin Ser Val Lys Thr Val Pro Gly Ala 
165 170 175 

Ser Asp Leu Phe Tyr Trp Pro Arg Arg Ala Gly Lys Leu Gin Gly Ser 
180 185 190 

His His Lys Pro Leu Ser Lys Thr Ala Asn Gly Leu Ala Gly His Glu 
195 200 205 

Gly Trp Thr He Ala Leu Pro Gly Arg Ala Leu Ala Gin Asn Gly Ser 
210 215 220 

Leu Gly Glu Gly He His Glu Pro Gly Gly Pro Arg Arg Gly Asn Ser 
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225 230 235 240 

Thr Asn Arg Arg Val Arg Leu Lys Asn Pro Phe Tyr Pro Leu Thr Gin 
245 250 255 

Glu Ser Tyr Gly Ala Tyr Ala Val Met Cys Leu Ser Val Val lie Phe 
260 265 270 

Gly Thr Gly lie He Gly Asn Leu Ala Val Met Cys He Val Cys His 
275 280 285 

Asn Tyr Tyr Met Arg Ser He Ser Asn Ser Leu Leu Ala Asn Leu Ala 
290 295 300 

Phe Trp Asp Phe Leu lie He Phe Phe Cys Leu Pro Leu Val He Phe 
305 310 315 320 

His Glu Leu Thr Lys Lys Trp Leu Leu Glu Asp Phe Ser Cys Lys He 
325 330 335 

Val Pro Tyr He Glu Val Ala Ser Leu Gly Val Thr Thr Phe Thr Leu 
340 345 350 

Cys Ala Leu Cys He Asp Arg Phe Arg Ala Ala Thr Asn Val Gin Met 
355 360 365 

Tyr Tyr Glu Met He Glu Asn Cys Ser Ser Thr Thr Ala Lys Leu Ala 
370 375 380 

Val He Trp Val Gly Ala Leu Leu Leu Ala Leu Pro Glu Val Val Leu 
385 390 395 400 

Arg Gin Leu Ser Lys Glu Asp Leu Gly Phe Ser Gly Arg Ala Pro Ala 
405 410 415 

Glu Arg Cys He He Lys He Ser Pro Asp Leu Pro Asp Thr He Tyr 
420 425 430 

Val Leu Ala Leu Thr Tyr Asp Ser Ala Arg Leu Trp Trp Tyr Phe Gly 
435 440 445 

Cys Tyr Phe Cys Leu Pro Thr Leu Phe Thr He Thr Cys Ser Leu Val 
450 455 460 

Thr Ala Arg Lys He Arg Lys Ala Glu Lys Ala Cys Thr Arg Gly Asn 
465 470 475 480 

Lys Arg Gin He Gin Leu Glu Ser Gin Met Lys Cys Thr Val Val Ala 
485 490 495 

Leu Thr He Leu Tyr Gly Phe Cys He He Pro Glu Asn He Cys Asn 
500 505 510 

He Val Thr Ala Tyr Met Ala Thr Gly Val Ser Gin Gin Thr Met Asp 
515 520 525 
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Leu Leu Asn lie lie Ser Gin Phe Leu Leu Phe Phe Lys Ser Cys Val 
530 535 540 

Thr Pro Val Leu Leu Phe Cys Leu Cys Lys Pro Phe Ser Arg Ala Phe 
545 550 555 56 0 

5 Met Glu Cys Cys Cys Cys Cys Cys Glu Glu Cys He Gin Lys Ser Ser 

565 570 575 

Thr Val Thr Ser Asp Asp Asn Asp Asn Glu Tyr Thr Thr Glu Leu Glu 
580 585 590 

Leu Ser Pro Phe Ser Thr He Arg Arg Glu Met Ser Thr Phe Ala Ser 
10 595 600 605 

Val Gly Thr His Cys 
610 

(214) INFORMATION FOR SEQ ID NO: 2 13: 

(i) SEQUENCE CHARACTERISTICS: 
15 (A) LENGTH: 124 8 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 

20 (xi) SEQUENCE DESCRIPTION: SEQ ID NO:213: 

ATGGTTTTTG CTCACAGAAT 

GTGCCCCTCC AAAACCGCAG 

ATGGAATTAA GTGAGGAGCA 

AAACCCGGGG AAGTGGCCAC 

25 TTCGGCAATT CCCTGGTTTG 

AACTACTTTG TGGTCTCCAT 

TTCGTCCTGC TCCAGTTCAC 

GTGCGATATT TTCAATATCT 

ATAGACCGGT TCTACAC CAT 

30 AAGAAAATGA TTGCGGCATC 

TTCTATGGCT CCAACTGGGA 

ACTGCCTACA CTGTCATCCA 

TTATTTTACC AAAAGGTCAT 



BNSDOCID: <WO 0022 1 29 A 1 I > 



GGATAACAGC AAGCCACATT TGATTATTCC TACACTTCTG 6 0 

CTGCACTGAA ACAGCCACAC CTCTGCCAAG CCAATACCTG 12 0 

CAGTTGGATG AGCAACCAAA CAGACCTTCA CTATGTGCTG 18 0 

AGCCAGCATC TTCTTTGGGA TTCTGTGGTT GTTTTCTATC 24 0 

TTTGGTCATC CATAGGAGTA GGAGGACTCA GTCTACCACC 3 00 

GGCATGTGCT GACCTTCTCA TCAGCGTTGC CAGCACGCCT 3 60 

CACTGGAAGG TGGACGCTGG GTAGTGCAAC GTGCAAGGTT 42 0 

CACTCCAGGT GTCCAGATCT ACGTTCTCCT CTCCATCTGC 4 80 

CGTCTATCCT CTGAGCTTCA AGGTGTCCAG AG AAAAAG C C 54 0 

GTGGATCTTT GATGCAGGCT TTGTGACCCC TGTGCTCTTT 6 00 

CAGTCATTGT AACTATTTCC TCCCCTCCTC TTGGGAAGGC 66 0 

CTTCTTGGTG GGCTTTGTGA TTCCATCTGT CCTCATAATT 720 

AAAATATATT TGGAGAATAG GCACAGATGG CCGAACGGTG 780 
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AGGAGGACAA TGAACATTGT CCCTCGGACA AAAGTGAAAA CTAAAAAGAT GTTCCTCATT 84 0 

TTAAATCTGT TGTTTTTGCT CTCCTGGCTG CCTTTTCATG TAGCTCAGCT ATGGCACCCC 900 

CATGAACAAG ACTATAAGAA AAGTTCCCTT GTTTTCACAG CTATCACATG GATATCCTTT DSO 

AGTTCTTCAG CCTCTAAACC TACTCTGTAT TCAATTTATA ATGCCAATTT TCGGAGAGGG 102 0 

5 ATGAAAGAGA CTTTTTGCAT GTCCTCTATG AAATGTTACC GAAGCAATGC CTATACTATC 108 0 

ACAACAAGTT CAAGGATGGC CAAAAAAAAC TACGTTGGCA TTTCAGAAAT CCCTTCCATG 114 0 

GCCAAAACTA TTACCAAAGA CTCGATCTAT GACTCATTTG ACAGAGAAGC CAAGGAAAAA 12 0 0 

AAGCTTGCTT GGCCCATTAA CTCAAATCCA CCAAATACTT TTGTCTAA 124 8 
(215) INFORMATION FOR SEQ ID NO:214: 

10 (i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 415 amino acids 

(B) TYPE: amino acid 

(C) STRANDEDNESS : 

(D) TOPOLOGY: not relevant 

15 (ii) MOLECULE TYPE: protein 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO:214: 

Met Val Phe Ala His Arg Met Asp Asn Ser Lys Pro His Leu lie lie 
15 10 15 

Pro Thr Leu Leu Val Pro Leu Gin Asn Arg Ser Cys Thr Glu Thr Ala 
20 20 25 30 

Thr Pro Leu Pro Ser Gin Tyr Leu Met Glu Leu Ser Glu Glu His Ser 
35 40 45 

Trp Met Ser Asn Gin Thr Asp Leu His Tyr Val Leu Lys Pro Gly Glu 
50 55 60 

25 Val Ala Thr Ala Ser lie Phe Phe Gly lie Leu Trp Leu Phe Ser lie 

65 70 75 80 

Phe Gly Asn Ser Leu Val Cys Leu Val lie His Arg Ser Arg Arg Thr 
85 90 95 

Gin Ser Thr Thr Asn Tyr Phe Val Val Ser Met Ala Cys Ala Asp Leu 
30 100 105 110 

Leu lie Ser Val Ala Ser Thr Pro Phe Val Leu Leu Gin Phe Thr Thr 
115 120 125 

Gly Arg Trp Thr Leu Gly Ser Ala Thr Cys Lys Val Val Arg Tyr Phe 
130 135 140 
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Gin Tyr Leu Thr Pro Gly Val Gin lie Tyr Val Leu Leu Ser lie Cys 
145 150 155 i 6 o 

lie Asp Arg Phe Tyr Thr lie Val Tyr Pro Leu Ser Phe Lys Val Ser 
165 170 175 

Arg Glu Lys Ala Lys Lys Met lie Ala Ala Ser Trp lie Phe Asp Ala 
180 185 iso 

Gly Phe Val Thr Pro Val Leu Phe Phe Tyr Gly Ser Asn Trp Asp Ser 
195 200 205 

His Cys Asn Tyr Phe Leu Pro Ser Ser Trp Glu Gly Thr Ala Tyr Thr 
210 215 220 

Val lie His Phe Leu Val Gly Phe Val He Pro Ser Val Leu He He 
225 230 235 240 

Leu Phe Tyr Gin Lys Val He Lys Tyr He Trp Arg He Gly Thr Asp 
245 250 255 

Gly Arg Thr Val Arg Arg Thr Met Asn He Val Pro Arg Thr Lys Val 
260 265 270 

Lys Thr Lys Lys Met Phe Leu He Leu Asn Leu Leu Phe Leu Leu Ser 
275 280 285 

Trp Leu Pro Phe His Val Ala Gin Leu Trp His Pro His Glu Gin Asp 
290 295 300 

Tyr Lys Lys Ser Ser Leu Val Phe Thr Ala He Thr Trp He Ser Phe 
305 310 315 320 

Ser Ser Ser Ala Ser Lys Pro Thr Leu Tyr Ser He Tyr Asn Ala Asn 
325 330 335 

Phe Arg Arg Gly Met Lys Glu Thr Phe Cys Met Ser Ser Met Lys Cys 
340 345 350 

Tyr Arg Ser Asn Ala Tyr Thr He Thr Thr Ser Ser Arg Met Ala Lys 
355 360 365 

Lys Asn Tyr Val Gly He Ser Glu He Pro Ser Met Ala Lys Thr He 
370 375 380 

Thr Lys Asp Ser He Tyr Asp Ser Phe Asp Arg Glu Ala Lys Glu Lys 
385 390 395 400 

Lys Leu Ala Trp Pro He Asn Ser Asn Pro Pro Asn Thr Phe Val 
405 410 415 

(216) INFORMATION FOR SEQ ID NO: 215: 



(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 1842 base pairs 
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(B) TYPE : nucleic acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 
5 (xi) SEQUENCE DESCRIPTION: SEQ ID NO: 215: 

ATGGGGCCCA CCCTAGCGGT TCCCACCCCC TATGGCTGTA TTGGCTGTAA GCTACCCCAG 6 0 

CCAGAATACC CACCGGCTCT AATC AT CTTT ATGTTCTGCG CGATGGTTAT CACCATCGTT 12 0 
G TAG AC CTAA TCGGCAACTC CATGGTCATT TTGG CTGTGA CGAAGAACAA GAAGCTCCGG 18 0 
AATTCTGGCA ACATCTTCGT GGTCAGTCTC TCTGTGGCCG ATATGCTGGT GGCCATCTAC 24 0 

10 CCATACCCTT TGATGCTGCA TGCCATGTCC ATTGGGGGCT GGGATCTGAG CCAGTTACAG 3 00 
TGCCAGATGG TCGGGTTCAT CACAGGGCTG AGTGTGGTCG GCTCCATCTT CAACATCGTG 36 0 
GCAATCGCTA TCAACCGTTA CTGCTACATC TGCCACAGCC TCCAGTACGA ACGGATCTTC 42 0 
AGTGTGCGCA ATACCTGCAT CTAC CTGGTC ATCACCTGGA TCATGACCGT CCTGGCTGTC 48 0 
CTGCCCAACA TGTACATTGG CACCATCGAG TACGATCCTC GCACCTACAC CTGCATCTTC 54 0 

15 AACTATCTGA ACAACCCTGT CTTCACTGTT ACCATCGTCT GCATCCACTT CGTCCTCCCT 6 00 
CTCCTCATCG TGGGTTTCTG CTACGTGAGG ATCTGGACCA AAGTGCTGGC GGCCCGTGAC 66 0 
CCTGCAGGGC AGAATCCTGA CAACCAACTT GCTGAGGTTC GCAATAAACT AACCATGTTT 72 0 
GTGATCTTCC TCCTCTTTGC AGTGTGCTGG TGCCCTATCA ACGTGCTCAC TGTCTTGGTG 78 0 
GCTGTCAGTC CGAAGGAGAT GGCAGGCAAG ATCCCCAACT GGCTTTATCT TGCAGCCTAC 84 0 

20 TTCATAGCCT ACTTCAACAG CTGCCTCAAC GCTGTGATCT ACGGGCTCCT CAATGAGAAT 90 0 
TTCCGAAGAG AATACTGGAC CATCTTCCAT GCTATGCGGC ACCCTATCAT ATTCTTCTCT 96 0 

GGCCTCATCA GTGATATTCG TGAGATGCAG GAGGCCCGTA CCCTGGCCCG CGCCCGTGCC 102 0 

CATGCTCGCG ACCAAGCTCG TGAACAAGAC CGTGCCCATG CCTGTCCTGC TGTGGAGGAA 108 0 

ACCCCGATGA ATGTCCGGAA TGTTCCATTA CCTGGTGATG CTGCAG CTGG CCACCCCGAC 114 0 

25 CGTGCCTCTG GCCACCCTAA GCCCCATTCC AGATCCTCCT CTGCCTATCG CAAATCTGCC 12 00 

TCTACCCACC ACAAGTCTGT CTTTAG CC AC TCCAAGGCTG CCTCTGGTCA CCTCAAGCCT 12 6 0 

GTCTCTGGCC ACTCCAAGCC TGCCTCTGGT CACCCCAAGT CTGCCACTGT CTACCCTAAG 132 0 

CCTGCCTCTG TCCATTTCAA GGCTGACTCT GTCCATTTCA AGGGTGACTC TGTCCATTTC 138 0 

AAGCCTGACT CTGTTCATTT CAAGCCTGCT TCCAGCAACC CCAAGCCCAT CACTGGCCAC 14 4 0 
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CATGTCTCTG CTGGCAGCCA CTCCAAGTCT GCCTTCAATG CTGCCACCAG CCACCCTAAA 15 00 

CCCATCAAGC CAGCTACCAG CCATGCTGAG CCCACCACTG CTGACTATCC CAAGCCTGCC 156 0 

ACTACCAGCC ACCCTAAGCC CGCTGCTGCT GACAACCCTG AGCTCTCTGC CTCCCATTGC 162 0 

CCCGAGATCC CTGCCATTGC CCACCCTGTG TCTGACGACA GTGACCTCCC TGAGTCGGCC 16 8 0 

TCTAGCCCTG CCGCTGGGCC CACCAAGCCT GCTGCCAGCC AGCTGGAGTC TGACACCATC 174 0 

GCTGACCTTC CTGACCCTAC TGTAGTCACT ACCAGTACCA ATGATTACCA TGATGTCGTG 18 0 0 

GTTGTTGATG TTGAAGATGA TCCTGATGAA ATGGCTGTGT GA 184 2 
(217) INFORMATION FOR SEQ ID NO: 2 16: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 613 amino acids 

(B) TYPE: amino acid 

( C ) STRANDEDNES S : 

(D) TOPOLOGY: not relevant 

(ii) MOLECULE TYPE: protein 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO:216: 

Met Gly Pro Thr Leu Ala Val Pro Thr Pro Tyr Gly Cys He Gly Cys 
15 10 15 

Lys Leu Pro Gin Pro Glu Tyr Pro Pro Ala Leu He lie Phe Met Phe 
20 25 30 

Cys Ala Met Val He Thr He Val Val Asp Leu He Gly Asn Ser Met 
35 40 45 

Val He Leu Ala Val Thr Lys Asn Lys Lys Leu Arg Asn Ser Gly Asn 
50 55 60 

He Phe Val Val Ser Leu Ser Val Ala Asp Met Leu Val Ala He Tyr 
65 70 75 80 

Pro Tyr Pro Leu Met Leu His Ala Met Ser He Gly Gly Trp Asp Leu 
85 90 95 

Ser Gin Leu Gin Cys Gin Met Val Gly Phe He Thr Gly Leu Ser Val 
100 105 no 

Val Gly Ser He Phe Asn He Val Ala He Ala He Asn Arg Tyr Cys 
115 120 125 

Tyr He Cys His Ser Leu Gin Tyr Glu Arg He Phe Ser Val Arg Asn 
130 135 140 

Thr Cys He Tyr Leu Val He Thr Trp He Met Thr Val Leu Ala Val 
145 150 155 160 
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Leu Pro Asn Met Tyr He Gly Thr He Glu Tyr Asp Pro Arg Thr Tyr 
165 170 175 

Thr Cys He Phe Asn Tyr Leu Asn Asn Pro Val Phe Thr Val Thr He 
180 185 190 

Val Cys He His Phe Val Leu Pro Leu Leu He Val Gly Phe Cys Tyr 
195 200 205 

Val Arg He Trp Thr Lys Val Leu Ala Ala Arg Asp Pro Ala Gly Gin 
210 215 220 

Asn Pro Asp Asn Gin Leu Ala Glu Val Arg Asn Lys Leu Thr Met Phe 
225 230 235 240 

Val He Phe Leu Leu Phe Ala Val Cys Trp Cys Pro He Asn Val Leu 
245 250 255 

Thr Val Leu Val Ala Val Ser Pro Lys Glu Met Ala Gly Lys lie Pro 
260 265 270 

Asn Trp Leu Tyr Leu Ala Ala Tyr Phe He Ala Tyr Phe Asn Ser Cys 
275 280 285 

Leu Asn Ala Val He Tyr Gly Leu Leu Asn Glu Asn Phe Arg Arg Glu 
290 295 300 

Tyr Trp Thr He Phe His Ala Met Arg His Pro lie He Phe Phe Ser 
305 310 315 320 

Gly Leu He Ser Asp He Arg Glu Met Gin Glu Ala Arg Thr Leu Ala 
325 330 335 

Arg Ala Arg Ala His Ala Arg Asp Gin Ala Arg Glu Gin Asp Arg Ala 
340 345 350 

His Ala Cys Pro Ala Val Glu Glu Thr Pro Met Asn Val Arg Asn Val 
355 360 365 

Pro Leu Pro Gly Asp Ala Ala Ala Gly His Pro Asp Arg Ala Ser Gly 
370 375 380 

His Pro Lys Pro His Ser Arg Ser Ser Ser Ala Tyr Arg Lys Ser Ala 
385 390 395 400 

Ser Thr His His Lys Ser Val Phe Ser His Ser Lys Ala Ala Ser Gly 
405 410 415 

His Leu Lys Pro Val Ser Gly His Ser Lys Pro Ala Ser Gly His Pro 
420 425 430 

Lys Ser Ala Thr Val Tyr Pro Lys Pro Ala Ser Val His Phe Lys Ala 
435 440 445 



Asp Ser Val His Phe Lys Gly Asp Ser Val His Phe Lys Pro Asp Ser 
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450 



455 



460 



Val His Phe Lys Pro 
465 

His Val Ser Ala Gly 
485 

Ser His Pro Lys Pro 
500 

Thr Ala Asp Tyr Pro 
515 

Ala Ala Asp Asn Pro 
530 



Ala Ser Ser Asn Pro Lys 
470 475 

Ser His Ser Lys Ser Ala 
490 

lie Lys Pro Ala Thr Ser 
505 

Lys Pro Ala Thr Thr Ser 
520 

Glu Leu Ser Ala Ser His 
535 



Pro lie Thr Gly His 
480 

Phe Asn Ala Ala Thr 
495 

His Ala Glu Pro Thr 
510 

His Pro Lys Pro Ala 
525 

Cys Pro Glu He Pro 
540 



Ala He Ala His 
545 

Ser Ser Pro Ala 



Ser Asp Thr lie 
580 

Thr Asn Asp Tyr 
595 

Asp Glu Met Ala 
610 



Pro Val Ser Asp 
550 

Ala Gly Pro Thr 
565 

Ala Asp Leu Pro 



His Asp Val Val 
600 

Val 



Asp Ser Asp Leu 
555 

Lys Pro Ala Ala 
570 

Asp Pro Thr Val 
585 

Val Val Asp Val 



Pro Glu Ser Ala 
560 

Ser Gin Leu Glu 
575 

Val Thr Thr Ser 
590 

Glu Asp Asp Pro 
605 



(218) INFORMATION FOR SEQ ID NO: 217: 

{ i ) SEQUENCE CHARACTERISTICS : 

(A) LENGTH: 1854 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 217: 



ATGGGGCCCA CCCTAGCGGT TCCCACCCCC TATGGCTGTA TTGGCTGTAA GCTACCCCAG 6 0 

CCAGAATACC CACCGGCTCT AATCATCTTT ATGTTCTGCG CGATGGTTAT CACCATCGTT 12 0 

GTAGACCTAA TCGGCAACTC CATGGTCATT TTGGCTGTGA CGAAGAACAA GAAGCTCCGG 18 0 

AATTCTGGCA ACATCTTCGT GGTCAGTCTC TCTGTGGCCG ATATGCTGGT GGCCATCTAC 24 0 

CCATACCCTT TGATGCTGCA TGCCATGTCC ATTGGGGGCT GGGATCTGAG CCAGTTACAG 300 

TGCCAGATGG TCGGGTTCAT CACAGGGCTG AGTGTGGTCG GCTCCATCTT CAACATCGTG 360 
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GCAATCGCTA TCAACCGTTA CTGCTACATC TGCCACAGCC TCCAGTACGA ACGGATCTTC 4 20 

AGTGTGCGCA ATACCTGCAT CTACCTGGTC ATCACCTGGA TCATGACCGT CCTGGCTGTC 4 80 

CTGCCCAACA TGTACATTGG CACCATCGAG TACGATCCTC GCACCTACAC CTGCATCTTC 54 0 

AACTATCTGA ACAACCCTGT CTTCACTGTT ACCATCGTCT GCATCCACTT CGTCCTCCCT 6 00 

CTCCTCATCG TGGGTTTCTG CTACGTGAGG ATCTGGACCA AAGTGCTGGC GGCCCGTGAC 6 60 

CCTGCAGGGC AGAATCCTGA CAACCAACTT GCTGAGGTTC G CAATAAACT AACCATGTTT . 72 0 

GTGATCTTCC TCCTCTTTGC AGTGTGCTGG TGCCCTATCA ACGTGCTCAC TGTCTTGGTG 78 0 

GCTGTCAGTC CGAAGGAGAT GGCAGGCAAG ATCCCCAACT GGCTTTATCT TGCAGCCTAC 84 0 

TT CAT AG CC T ACTTCAACAG CTGCCTCAAC GCTGTGATCT ACGGGCTCCT CAATGAGAAT 90 0 

TTCCGAAG AG AATACTGGAC CATCTTCCAT GCTATGCGGC ACCCTATCAT ATTCTTCTCT 96 0 

GGCCTCATCA GTGATATTCG TGAGATGCAG GAGGCCCGTA CCCTGGCCCG CGCCCGTGCC 102 0 

CATGCTCGCG ACCAAGCTCG TGAACAAGAC CGTGCCCATG CCTGTCCTGC TGTGGAGGAA 1080 

ACCCCGATGA ATGTCCGGAA TGTTCCATTA CCTGGTGATG CTGCAGCTGG CCACCCCGAC 114 0 

CGTGCCTCTG GCCACCCTAA GCCCCATTCC AGATCCTCCT CTGCCTATCG CAAATCTGCC 12 00 

TCTACCCACC ACAAGT CTG T CTTTAGCCAC TCCAAGGCTG CCTCTGGTCA CCTCAAGCCT 12 6 0 

GTCTCTGGCC ACTCCAAGCC TGCCTCTGGT CACCCCAAGT CTGCCACTGT CTACCCTAAG 132 0 

CCTGCCTCTG TCCATTTCAA GGCTGACTCT GTCCATTTCA AGGGTGACTC TGTCCATTTC 13 8 0 

AAGCCTGACT CTGTTCATTT CAAGCCTGCT TCCAGCAACC CCAAGCCCAT CACTGGCCAC 144 0 

CATGTCTCTG CTGGCAGCCA CTCCAAGTCT GCCTTCAGTG CTGCCACCAG CCACCCTAAA 150 0 

CCCACCACTG GCCACATCAA GCCAGCTACC AGCCATGCTG AGCCCACCAC TG CTG ACT AT 156 0 

CCCAAGCCTG CCACTACCAG CCACCCTAAG CCCACTGCTG CTGACAACCC TGAGCTCTCT 162 0 

GCCTCCCATT GCCCCGAGAT CCCTGCCATT GCCCACCCTG TGTCTGACGA CAGTGACCTC 16 80 

CCTGAGTCGG CCTCTAGCCC TGCCGCTGGG CCCACCAAGC CTGCTGCCAG CCAGCTGGAG 174 0 

T CTG ACACCA TCGCTGACCT TCCTGACCCT ACTGTAGTCA CTACCAGTAC CAATGATTAC 18 00 

CATGATGTCG TGGTTGTTGA TGTTGAAGAT GATCCTGATG AAATGGCTGT GTGA 18 54 
(219) INFORMATION FOR SEQ ID NO: 2 18: 

(i) SEQUENCE CHARACTERISTICS : 

(A) LENGTH: 617 amino acids 

(B) TYPE: amino acid 
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( C ) STRANDEDNESS : 

(D) TOPOLOGY: not relevant 

(ii) MOLECULE TYPE: protein 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 218: 

Met Gly Pro Thr Leu Ala Val Pro Thr Pro Tyr Gly Cys lie Gly Cys 
15 10 is 

Lys Leu Pro Gin Pro Glu Tyr Pro Pro Ala Leu lie lie Phe Met Phe 
20 25 30 

Cys Ala Met Val lie Thr lie Val Val Asp Leu lie Gly Asn Ser Met 
35 40 45 

Val lie Leu Ala Val Thr Lys Asn Lys Lys Leu Arg Asn Ser Gly Asn 
50 55 60 

He Phe Val Val Ser Leu Ser Val Ala Asp Met Leu Val Ala He Tyr 
65 70 75 80 

Pro Tyr Pro Leu Met Leu His Ala Met Ser lie Gly Gly Trp Asp Leu 
85 90 95 

Ser Gin Leu Gin Cys Gin Met Val Gly Phe He Thr Gly Leu Ser Val 
100 105 no 

Val Gly Ser He Phe Asn He Val Ala He Ala He Asn Arg Tyr Cys 
115 120 125 

Tyr He Cys His Ser Leu Gin Tyr Glu Arg He Phe Ser Val Arg Asn 
130 135 140 

Thr Cys He Tyr Leu Val He Thr Trp He Met Thr Val Leu Ala Val 
145 150 155 160 

Leu Pro Asn Met Tyr He Gly Thr He Glu Tyr Asp Pro Arg Thr Tyr 
165 170 175 

Thr Cys He Phe Asn Tyr Leu Asn Asn Pro Val Phe Thr Val Thr He 
180 185 190 

Val Cys He His Phe Val Leu Pro Leu Leu He Val Gly Phe Cys Tyr 
195 200 205 

Val Arg He Trp Thr Lys Val Leu Ala Ala Arg Asp Pro Ala Gly Gin 
210 215 220 

Asn Pro Asp Asn Gin Leu Ala Glu Val Arg Asn Lys Leu Thr Met Phe 
225 230 235 240 



Val He Phe Leu Leu Phe Ala Val Cys Trp Cys Pro He Asn Val Leu 
245 250 255 
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Thr Val Leu Val Ala Val Ser Pro Lys Glu Met Ala Gly Lys lie Pro 
260 265 270 

Asn Trp Leu Tyr Leu Ala Ala Tyr Phe lie Ala Tyr Phe Asn Ser Cys 
275 280 285 

5 Leu Asn Ala Val He Tyr Gly Leu Leu Asn Glu Asn Phe Arg Arg Glu 

290 295 300 

Tyr Trp Thr He Phe His Ala Met Arg His Pro He He Phe Phe Ser 
305 310 315 320 

Gly Leu He Ser Asp He Arg Glu Met Gin Glu Ala Arg Thr Leu Ala 
10 325 330 335 

Arg Ala Arg Ala His Ala Arg Asp Gin Ala Arg Glu Gin Asp Arg Ala 
340 345 350 

His Ala Cys Pro Ala Val Glu Glu Thr Pro Met Asn Val Arg Asn Val 
355 360 365 

15 Pro Leu Pro Gly Asp Ala Ala Ala Gly His Pro Asp Arg Ala Ser Gly 

370 375 380 

His Pro Lys Pro His Ser Arg Ser Ser Ser Ala Tyr Arg Lys Ser Ala 
385 390 395 400 

Ser Thr His His Lys Ser Val Phe Ser His Ser Lys Ala Ala Ser Gly 
20 405 410 415 

His Leu Lys Pro Val Ser Gly His Ser Lys Pro Ala Ser Gly His Pro 
420 425 430 

Lys Ser Ala Thr Val Tyr Pro Lys Pro Ala Ser Val His Phe Lys Ala 
435 440 445 

25 Asp Ser Val His Phe Lys Gly Asp Ser Val His Phe Lys Pro Asp Ser 

450 455 460 

Val His Phe Lys Pro Ala Ser Ser Asn Pro Lys Pro He Thr Gly His 
465 470 475 480 

His Val Ser Ala Gly Ser His Ser Lys Ser Ala Phe Ser Ala Ala Thr 
30 485 490 495 

Ser His Pro Lys Pro Thr Thr Gly His He Lys Pro Ala Thr Ser His 
500 505 510 

Ala Glu Pro Thr Thr Ala Asp Tyr Pro Lys Pro Ala Thr Thr Ser His 
515 520 525 

35 p r o Lys Pro Thr Ala Ala Asp Asn Pro Glu Leu Ser Ala Ser His Cys 

530 535 540 

Pro Glu He Pro Ala He Ala His Pro Val Ser Asp Asp Ser Asp Leu 
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545 



550 



555 



560 



Pro Glu Ser Ala Ser Ser Pro Ala Ala 
565 



Gly Pro Thr Lys Pro Ala Ala 
570 575 



Ser Gin Leu Glu Ser Asp Thr lie Ala 
580 585 



Asp Leu Pro Asp Pro Thr Val 
590 



Val Thr Thr Ser Thr Asn Asp Tyr His 
595 600 



Asp Val Val Val Val Asp Val 
605 



Glu Asp Asp Pro Asp Glu Met Ala Val 
610 615 



10 



15 



20 



25 



(220) INFORMATION FOR SEQ ID NO: 219: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 154 8 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : single 
{ D ) TOPOLOGY : 1 inear 

(ii) MOLECULE TYPE: DNA (genomic) 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO:219: 

ATGGGACATA ACGGGAGCTG GATCTCTCCA AATGCCAGCG AGCCGCACAA CGCGTCCGGC 6 0 

GCCGAGGCTG CGGGTGTGAA CCGCAGCGCG CTCGGGGAGT TCGGCGAGGC GCAGCTGTAC 12 0 

CGCCAGTTCA CCACCACCGT GCAGGTCGTC ATCTTCATAG GCTCGCTGCT CGGAAACTTC 18 0 

ATGGTGTTAT GGTCAACTTG CCGCACAACC GTGTTCAAAT CTGTCACCAA CAGGTTCATT 24 0 

AAAAACCTGG CCTGCTCGGG GATTTGTGCC AGCCTGGTCT GTGTGCCCTT CGACATCATC 3 00 

CTCAGCACCA GTCCTCACTG TTGCTGGTGG ATCTACACCA TGCTCTTCTG CAAGGTCGTC 3 60 

AAATTTTTGC ACAAAGTATT CTGCTCTGTG ACCATCCTCA GCTTCCCTGC TATTGCTTTG 42 0 

GACAGGTACT ACTCAGTCCT CTATCCACTG GAGAGGAAAA TATCTGATGC CAAGTCCCGT 48 0 

GAACTGGTGA TGTACATCTG GGCCCATGCA GTGGTGGCCA GTGTCCCTGT GTTTGCAGTA 54 0 

ACCAATGTGG CTGACATCTA TGCCACGTCC ACCTGCACGG AAGTCTGGAG CAACTCCTTG 6 00 

GGCCACCTGG TGTACGTTCT GGTGTATAAC ATCACCACGG TCATTGTGCC TGTGGTGGTG 66 0 

GTGTTCCTCT TCTTGATACT GATCCGACGG GCCCTGAGTG CCAGCCAGAA GAAGAAGGTC 72 0 

ATCATAGCAG CGCTCCGGAC CCCACAGAAC ACCATCTCTA TTCCCTATGC CTCCCAGCGG 78 0 

GAGGCCGAGC TGAAAGC C AC CCTGCTCTCC ATGGTGATGG TCTTCATCTT GTGTAGCGTG 84 0 

CCCTATGCCA CCCTGGTCGT CTAC C AG ACT GTGCTCAATG TCCCTGACAC TTCCGTCTTC 900 
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TTGCTGCTCA CTGCTGTTTG GCTGCCCAAA GTCTCCCTGC TGGCAAACCC TGTTCTCTTT 96 0 

CTTACTGTGA ACAAATCTGT CCGCAAGTGC TTGATAGGGA CCCTGGTGCA ACTACACCAC 10 2 0 

CGGTACAGTC GCCGTAATGT GGTCAGTACA GGGAGTGGCA TGGCTGAGGC CAGCCTGGAA 10 8 0 

CCCAGCATAC GCTCGGGTAG CCAGCTCCTG GAGATGTTCC AC ATTGGG C A GCAGCAGATC 114 0 

5 TTTAAGCCCA CAGAGGATGA GGAAGAGAGT GAGGCCAAGT ACATTGGCTC AGCTGACTTC 12 00 

CAGGCCAAGG AGATATTTAG CACCTGCCTG GAGGGAGAGC AGGGGCCACA GTTTGCGCCC 12 6 0 

TCTGCCCCAC CCCTGAGCAC AGTGGACTCT GTATCCCAGG TGGCACCGGC AGCCCCTGTG 13 2 0 

G AAC CTG AAA CATTCCCTGA TAAGTATTCC CTGCAGTTTG GCTTTGGGCC TTTTGAGTTG 13 80 

CCTCCTCAGT GGCTCTCAGA GACCCGAAAC AGCAAGAAGC GGCTGCTTCC CCCCTTGGGC 14 4 0 

10 AACACCCCAG AAGAGCTGAT CCAGACAAAG GTGCCCAAGG TAGGCAGGGT GGAGCGGAAG 1500 

ATGAGCAGAA ACAATAAAGT GAGCATTTTT CCAAAGGTGG ATTCCTAG 154 8 
(221) INFORMATION FOR SEQ ID NO: 220: 

(i) SEQUENCE CHARACTERISTICS: 
(A) LENGTH: 515 amino acids 

15 (B) TYPE: amino acid 

( C ) STRANDEDNESS : 

(D) TOPOLOGY: not relevant 

(ii) MOLECULE TYPE: protein 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 220: 

20 Met Gly His Asn Gly Ser Trp lie Ser Pro Asn Ala Ser Glu Pro His 

15 10 15 

Asn Ala Ser Gly Ala Glu Ala Ala Gly Val Asn Arg Ser Ala Leu Gly 
20 25 30 

Glu Phe Gly Glu Ala Gin Leu Tyr Arg Gin Phe Thr Thr Thr Val Gin 
25 35 40 45 

Val Val lie Phe lie Gly Ser Leu Leu Gly Asn Phe Met Val Leu Trp 
50 55 60 

Ser Thr Cys Arg Thr Thr Val Phe Lys Ser Val Thr Asn Arg Phe lie 
65 70 75 80 

30 Lys Asn Leu Ala Cys Ser Gly lie Cys Ala Ser Leu Val Cys Val Pro 

85 90 95 

Phe Asp lie lie Leu Ser Thr Ser Pro His Cys Cys Trp Trp lie Tyr 
100 105 110 
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Thr Met Leu Phe Cys Lys Val Val Lys Phe Leu His Lys Val Phe Cys 
115 120 125 

Ser Val Thr He Leu Ser Phe Pro Ala He Ala Leu Asp Arg Tyr Tyr 
130 135 140 

Ser Val Leu Tyr Pro Leu Glu Arg Lys He Ser Asp Ala Lys Ser Arg 
145 150 155 160 

Glu Leu Val Met Tyr He Trp Ala His Ala Val Val Ala Ser Val Pro 
165 170 175 

Val Phe Ala Val Thr Asn Val Ala Asp He Tyr Ala Thr Ser Thr Cys 
180 185 190 

Thr Glu Val Trp Ser Asn Ser Leu Gly His Leu Val Tyr Val Leu Val 
195 200 205 

Tyr Asn lie Thr Thr Val He Val Pro Val Val Val Val Phe Leu Phe 
210 215 220 

Leu He Leu He Arg Arg Ala Leu Ser Ala Ser Gin Lys Lys Lys Val 
225 230 235 240 

He He Ala Ala Leu Arg Thr Pro Gin Asn Thr He Ser He Pro Tyr 
245 250 255 

Ala Ser Gin Arg Glu Ala Glu Leu Lys Ala Thr Leu Leu Ser Met Val 
20 260 265 270 

Met Val Phe He Leu Cys Ser Val Pro Tyr Ala Thr Leu Val Val Tyr 
275 280 285 



15 



25 



30 



35 



Gin Thr Val Leu Asn Val Pro Asp Thr Ser Val Phe Leu Leu Leu Thr 
290 295 300 

Ala Val Trp Leu Pro Lys Val Ser Leu Leu Ala Asn Pro Val Leu Phe 
305 310 315 320 

Leu Thr Val Asn Lys Ser Val Arg Lys Cys Leu He Gly Thr Leu Val 
32 5 330 335 

Gin Leu His His Arg Tyr Ser Arg Arg Asn Val Val Ser Thr Gly Ser 
340 345 350 

Gly Met Ala Glu Ala Ser Leu Glu Pro Ser He Arg Ser Gly Ser Gin 
355 360 365 

Leu Leu Glu Met Phe His He Gly Gin Gin Gin He Phe Lys Pro Thr 
370 375 380 

Glu Asp Glu Glu Glu Ser Glu Ala Lys Tyr He Gly Ser Ala Asp Phe 
385 390 395 400 

Gin Ala Lys Glu He Phe Ser Thr Cys Leu Glu Gly Glu Gin Gly Pro 
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405 410 415 

Gin Phe Ala Pro Ser Ala Pro Pro Leu Ser Thr Val Asp Ser Val Ser 
420 425 430 

Gin Val Ala Pro Ala Ala Pro Val Glu Pro Glu Thr Phe Pro Asp Lys 
435 440 445 

Tyr Ser Leu Gin Phe Gly Phe Gly Pro Phe Glu Leu Pro Pro Gin Trp 
450 455 460 

Leu Ser Glu Thr Arg Asn Ser Lys Lys Arg Leu Leu Pro Pro Leu Gly 
465 470 475 480 

Asn Thr Pro Glu Glu Leu lie Gin Thr Lys Val Pro Lys Val Gly Arg 
485 490 495 

Val Glu Arg Lys Met Ser Arg Asn Asn Lys Val Ser lie Phe Pro Lys 
500 505 . 510 

Val Asp Ser 
515 

(222) INFORMATION FOR SEQ ID NO: 221: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 1164 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 

<ii) MOLECULE TYPE: DNA (genomic) 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 221: 

ATGAATCGGC ACCATCTGCA GGATCACTTT CTGGAAATAG ACAAGAAGAA CTGCTGTGTG 6 0 

TTCCGAGATG ACTTCATTGC CAAGGTGTTG CCGCCGGTGT TGGGG CTGGA GTTTATCTTT 12 0 

GGGCTTCTGG GCAATGGCCT TGCCCTGTGG ATTTTCTGTT TCCACCTCAA GTCCTGGAAA 18 0 

TCCAGCCGGA TTTTCCTGTT CAACCTGGCA GTAG CTG ACT TTCTACTGAT CATCTGCCTG 24 0 

CCGTTCGTGA TGGACTACTA TGTGCGGCGT TCAGACTGGA AGTTTGGGGA CATCCCTTGC 300 

CGGCTGGTGC TCTTCATGTT TGCCATGAAC CGCCAGGGCA GCATCATCTT CCTCACGGTG 36 0 

GTGGCGGTAG ACAGGTATTT CCGGGTGGTC CATCCCCACC ACGCCCTGAA CAAGATCTCC 42 0 

AATTGGACAG CAGCCATCAT CTCTTGCCTT CTGTGGGGCA TCACTGTTGG CCTAACAGTC 48 0 

CACCTCCTGA AGAAGAAGTT GCTGATCCAG AATGG CCCTG CAAATGTGTG CATCAGCTTC 54 0 

AGCATCTGCC ATACCTTCCG GTGGCACGAA GCTATGTTCC TCCTGGAGTT CCTCCTGCCC 6 00 
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CTGGGCATPA 


ILL 1U1 J. V— X \JJ 


V— x L_HV_jV_LMv_t>-\ 




GCCTGCGGCA 


GAGACAAATG 


660 


GACCGGCATG 


CCAAGATCAA 




HLL I 1 1 L A. 


1 I GGTGGC 


CATCGTCTTT 


720 


GTCATCTGCT 


TCCTTCCCAG 


CGTGGTTCJTfi 


p p p a t ccn, c i\ 


111 1 L I LiCjL I 


CCTGCACACT 


780 


TCGGGCACGC 


AGAATTGTGA 


AGTGTACCGC 


TCGGTGRflPr 




TATCACTCTC 


840 


AGCTTCACCT 


ACATGAACAG 




LV_.V_.VjJi \J\J X \J X 


7\ /""""PA /~ lr p r T 1 /^"T 1 /~i 
AL 1 A.v_ 1 ILlL 


CAGCCCATCC 


900 


TTTCCCAACT 


TCTTCTCCAC 


TTTGATPAAP 


L.UL ± VjjV„ v_ ILL. 


AbAb bAAbAT 


GACAGGTGAG 


960 


C C AG AT AATA 


ACCGCAGCAC 


gagcgtcgag 


CTCACAGGGG 


ACCCCAACAA 


AACCAGAGGC 


1020 


GCTCCAGAGG 


CGTTAATGGC 


caactccggt 


GAGCCATGGA 


GCCCCTCTTA 


TCTGGGCCCA 


1080 


ACCTCAAATA 


ACCATTCCAA 


gaagggacat 


TGTCACCAAG 


AACCAGCATC 


TCTGGAGAAA 


1140 


CAGTTGGGCT 


GTTGCATCGA 


GTAA 








1164 



(22 3) INFORMATION FOR SEQ ID NO: 222: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 38 7 amino acids 

(B) TYPE: amino acid 

(C) STRANDEDNESS : 

(D) TOPOLOGY: not relevant 

(ii) MOLECULE TYPE: protein 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO:222: 

Met Asn Arg His His Leu Gin Asp His Phe Leu Glu lie Asp Lys Lys 
15 10 15 

Asn Cys Cys Val Phe Arg Asp Asp Phe lie Ala Lys Val Leu Pro Pro 
20 25 30 

Val Leu Gly Leu Glu Phe lie Phe Gly Leu Leu Gly Asn Gly Leu Ala 
35 40 45 

Leu Trp lie Phe Cys Phe His Leu Lys Ser Trp Lys Ser Ser Arg lie 
50 55 60 

Phe Leu Phe Asn Leu Ala Val Ala Asp Phe Leu Leu lie lie Cys Leu 
65 70 75 80 

Pro Phe Val Met Asp Tyr Tyr Val Arg Arg Ser Asp Trp Lys Phe Gly 
85 90 95 

Asp lie Pro Cys Arg Leu Val Leu Phe Met Phe Ala Met Asn Arg Gin 
100 105 no 

Gly Ser lie lie Phe Leu Thr Val Val Ala Val Asp Arg Tyr Phe Arg 
115 120 125 
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Val Val His Pro His His Ala Leu Asn Lys lie Ser Asn Trp Thr Ala 
130 135 140 

Ala lie lie Ser Cys Leu Leu Trp Gly lie Thr Val Gly Leu Thr Val 
145 -50 155 160 

5 His Leu Leu Lys Lys Lys Leu Leu lie Gin Asn Gly Pro Ala Asn Val 

165 170 175 

Cys lie Ser Phe Ser lie Cys His Thr Phe Arg Trp His Glu Ala Met 
180 185 190 

Phe Leu Leu Glu Phe Leu Leu Pro Leu Gly lie lie Leu Phe Cys Ser 
10 195 200 205 

Ala Arg lie lie Trp Ser Leu Arg Gin Arg Gin Met Asp Arg His Ala 
210 215 220 

Lys lie Lys Arg Ala Lys Thr Phe lie Met Val Val Ala lie Val Phe 
225 230 235 240 

15 val lie Cys Phe Leu Pro Ser Val Val Val Arg lie Arg lie Phe Trp 

245 250 255 

Leu Leu His Thr Ser Gly Thr Gin Asn Cys Glu Val Tyr Arg Ser Val 
260 265 270 

Asp Leu Ala Phe Phe lie Thr Leu Ser Phe Thr Tyr Met Asn Ser Met 
20 275 280 285 

Leu Asp Pro Val Val Tyr Tyr Phe Ser Ser Pro Ser Phe Pro Asn Phe 
290 295 300 

Phe Ser Thr Leu lie Asn Arg Cys Leu Gin Arg Lys Met Thr Gly Glu 
305 310 315 320 

25 Pro Asp Asn Asn Arg Ser Thr Ser Val Glu Leu Thr Gly Asp Pro Asn 

325 330 335 

Lys Thr Arg Gly Ala Pro Glu Ala Leu Met Ala Asn Ser Gly Glu Pro 
340 345 350 

Trp Ser Pro Ser Tyr Leu Gly Pro Thr Ser Asn Asn His Ser Lys Lys 
30 355 360 365 

Gly His Cys His Gin Glu Pro Ala Ser Leu Glu Lys Gin Leu Gly Cys 
370 375 380 

Cys lie Glu 
385 

35 (224) INFORMATION FOR SEQ ID NO:223: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 1212 base pairs 
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(B) TYPE: nucleic acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: not relevant 

(ii) MOLECULE TYPE: DNA (genomic) 
(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 22 3: 

ATGGCTTGCA ATGGCAGTGC GGCCAGGGGG C ACTTTG AC C CTGAGGACTT GAACCTGACT 6 0 

GACGAGGCAC TGAGACTCAA GTACCTGGGG CCCCAGCAGA CAGAGCTGTT CATGCCCATC 12 0 

TGTGCCACAT ACCTGCTGAT CTTCGTGGTG GGCGCTGTGG GCAATGGGCT GACCTGTCTG 18 0 

GTCATCCTGC GCCACAAGGC CATGCGCACG CCTACCAACT ACTACCTCTT CAGCCTGGCC 24 0 

GTGTCGGACC TGCTGGTGCT GCTGGTGGGC CTGCCCCTGG AGCTCTATGA GATGTGG CAC 3 00 

AACTACCCCT TCCTG CTGGG CGTTGGTGGC TGCTATTTCC GCACGCTACT GTTTGAGATG 36 0 

GTCTGCCTGG CCTCAGTGCT CAACGTCACT GCCCTGAGCG TGGAACGCTA TGTGGCCGTG 42 0 

GTGCACCCAC TCCAGGCCAG GTCCATGGTG ACGCGGGCCC ATGTGCGCCG AGTGCTTGGG 48 0 

GCCGTCTGGG GTCTTGCCAT GCTCTGCTCC CTGCCCAACA CCAGCCTGCA CGGCATCCGG 54 0 

CAGCTGCACG TGCCCTGCCG GGGCCCAGTG CCAGACTCAG CTG TTTG CAT GCTGGTCCGC 6 00 

CCACGGGCCC TCTACAACAT GGTAGTGCAG ACCACCGCGC TGCTCTTCTT CTGCCTGCCC 66 0 

ATGGCCATCA TGAGCGTGCT CTACCTGCTC ATTGGGCTGC GACTGCGGCG GGAGAGGCTG 72 0 

CTGCTCATGC AGGAGGCCAA GGGCAGGGGC TCTGCAGCAG CCAGGTCCAG ATACACCTGC 780 

AGGCTCCAGC AGCACGATCG GGGCCGGAGA CAAGTGAAGA AGATGCTGTT TGTCCTGGTC 84 0 

GTGGTGTTTG GCATCTGCTG GGCCCCGTTC CACGCCGACC GCGTCATGTG GAG CGTCGTG 9 00 

TCACAGTGGA CAGATGGCCT GCACCTGGCC TTCCAGCACG TGCACGTCAT CTCCGGCATC 96 0 

TTCTTCTACC TGGGCTCGGC GGCCAACCCC GTGCTCTATA GCCTCATGTC CAGCCGCTTC 102 0 

CGAGAGACCT TCCAGGAGGC CCTGTGCCTC GGGGCCTGCT GCCATCGCCT CAGACCCCGC 108 0 

CACAGCTCCC ACAG CCTCAG CAGGATGACC ACAGGCAGCA CCCTGTGTGA TGTGGGCTCC 114 0 

CTGGG CAGCT GGGTCCACCC CCTGGCTGGG AACGATGGCC CAGAGGCGCA GCAAGAGAC C 12 00 

GATCCATCCT GA 1212 
(22 5) INFORMATION FOR SEQ ID NO: 224: 

(i) SEQUENCE CHARACTERISTICS : 

(A) LENGTH: 4 03 amino acids 

(B) TYPE: amino acid 
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( C ) STRANDEDNES S : 

(D) TOPOLOGY: not relevant 

(ii) MOLECULE TYPE: protein 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO; 224: 

5 Met Ala Cys Asn Gly Ser Ala Ala Arg Gly His Phe Asp Pro Glu Asp 

15 10 15 

Leu Asn Leu Thr Asp Glu Ala Leu Arg Leu Lys Tyr Leu Gly Pro Gin 
20 25 30 

Gin Thr Glu Leu Phe Met Pro lie Cys Ala Thr Tyr Leu Leu lie Phe 
10 35 40 45 

Val Val Gly Ala Val Gly Asn Gly Leu Thr Cys Leu Val lie Leu Arg 
50 55 60 

His Lys Ala Met Arg Thr Pro Thr Asn Tyr Tyr Leu Phe Ser Leu Ala 
65 70 75 80 

15 Val Ser Asp Leu Leu Val Leu Leu Val Gly Leu Pro Leu Glu Leu Tyr 

85 90 95 

Glu Met Trp His Asn Tyr Pro Phe Leu Leu Gly Val Gly Gly Cys Tyr 
100 105 110 

Phe Arg Thr Leu Leu Phe Glu Met Val Cys Leu Ala Ser Val Leu Asn 
20 115 120 125 

Val Thr Ala Leu Ser Val Glu Arg Tyr Val Ala Val Val His Pro Leu 
130 135 140 

Gin Ala Arg Ser Met Val Thr Arg Ala His Val Arg Arg Val Leu Gly 
145 150 155 160 

25 Ala Val Trp Gly Leu Ala Met Leu Cys Ser Leu Pro Asn Thr Ser Leu 

165 170 175 

His Gly lie Arg Gin Leu His Val Pro Cys Arg Gly Pro Val Pro Asp 
180 185 190 

Ser Ala Val Cys Met Leu Val Arg Pro Arg Ala Leu Tyr Asn Met Val 
30 195 200 205 

Val Gin Thr Thr Ala Leu Leu Phe Phe Cys Leu Pro Met Ala lie Met 
210 215 220 

Ser Val Leu Tyr Leu Leu lie Gly Leu Arg Leu Arg Arg Glu Arg Leu 
225 230 235 240 

35 Leu Leu Met Gin Glu Ala Lys Gly Arg Gly Ser Ala Ala Ala Arg Ser 

245 250 255 
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Arg Tyr Thr Cys Arg Leu Gin Gin His Asp Arg Gly Arg Arg Gin Val 
260 265 270 

Lys Lys Met Leu Phe Val Leu Val Val Val Phe Gly lie Cys Trp Ala 
275 280 285 

5 Pro Phe His Ala Asp Arg Val Met Trp Ser Val Val Ser Gin Trp Thr 

290 295 300 

Asp Gly Leu His Leu Ala Phe Gin His Val His Val He Ser Gly He 
305 310 315 320 

Phe Phe Tyr Leu Gly Ser Ala Ala Asn Pro Val Leu Tyr Ser Leu Met 
10 325 330 335 

Ser Ser Arg Phe Arg Glu Thr Phe Gin Glu Ala Leu Cys Leu Gly Ala 
340 345 350 

Cys Cys His Arg Leu Arg Pro Arg His Ser Ser His Ser Leu Ser Arg 
355 360 365 

15 Met Thr Thr Gly Ser Thr Leu Cys Asp Val Gly Ser Leu Gly Ser Trp 

370 375 380 

Val His Pro Leu Ala Gly Asn Asp Gly Pro Glu Ala Gin Gin Glu Thr 
385 390 395 400 



20 



Asp Pro Ser 



(226) INFORMATION FOR SEQ ID NO: 22 5: 



(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 1098 base pairs 

(B) TYPE: nucleic acid 
25 (C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO:225: 

ATGGGGAACA TCACTGCAGA CAACTCCTCG ATGAGCTGTA CCATCGACCA TACCATCCAC 6 0 

30 CAGACGCTGG CCCCGGTGGT CTATGTTACC GTGCTGGTGG TGGGCTTCCC GGCCAACTGC 12 0 

CTGTCCCTCT ACTTCGGCTA CCTGCAGATC AAGGCCCGGA ACGAGCTGGG CGTGTACCTG 18 0 

TGCAACCTGA CGGTGGCCGA CCTCTTCTAC ATCTGCTCGC TGCCCTTCTG GCTGCAGTAC 24 0 

GTGCTGCAGC ACGACAACTG GTCTCACGGC GACCTGTCCT GCCAGGTGTG CGGCATCCTC 3 00 

CTGTACGAGA ACATCTACAT CAGCGTGGGC TTCCTCTGCT GCATCTCCGT GGACCGCTAC 360 

35 CTGGCTGTGG CCCATCCCTT CCGCTTCCAC CAGTTCCGGA CCCTGAAGGC GGCCGTCGGC 42 0 
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GTCAG CGTGG TCATCTGGGC CAAGGAGCTG CTGACCAGCA TCTACTTCCT GATGCACGAG 480 

GAGGTCATCG AGGACGAGAA CCAGCACCGC GTGTGCTTTG AGCACTACCC CATCCAGGCA 54 0 

TGGCAGCGCG CCATCAACTA CTACCGCTTC CTGGTGGGCT TCCTCTTCCC CATCTGCCTG 6 00 

CTGCTGGCGT CCTACCAGGG CATCCTGCGC GCCGTGCGCC GGAGCCACGG CACCCAGAAG 660 

5 AGCCGCAAGG ACCAGATCAA GCGGCTGGTG CTCAGCACCG TGGTCATCTT CCTGGCCTGC 72 0 

TTCCTGCCCT ACCACGTGTT GCTGCTGGTG CGCAGCGTCT GGGAGGCCAG CTGCGACTTC 78 0 

GCCAAGGGCG TTTTCAACGC CTACCACTTC TCCCTCCTGC TCACCAGCTT CAACTG CGTC 84 0 

GCCGACCCCG TGCTCTACTG CTTCGTCAGC GAGACCACCC ACCGGGACCT GGCCCGCCTC 900 

CGCGGGGCCT GCCTGGCCTT CCTCACCTGC TCCAGGACCG GCCGGGCCAG GGAGGCCTAC 96 0 

10 CCGCTGGGTG CCCCCGAGGC CTCCGGGAAA AGCGGGGCCC AGGGTGAGGA GCCCGAGCTG 1020 

TTGACCAAGC TCCACCCGGC CTTCCAGACC CCTAACTCGC CAGGGTCGGG CGGGTTCCCC 10 8 0 

ACGGGCAGGT TGGCCTAG 109 8 
(227) INFORMATION FOR SEQ ID NO: 226: 

( i ) SEQUENCE CHARACTERISTICS : 
15 <A) LENGTH: 365 amino acids 

<B) TYPE: amino acid 

( C) STRANDEDNESS : 

(D) TOPOLOGY: not relevant 

(ii) MOLECULE TYPE: protein 

20 (xi) SEQUENCE DESCRIPTION: SEQ ID NO: 22 6: 

Met Gly Asn lie Thr Ala Asp Asn Ser Ser Met Ser Cys Thr lie Asp 
1 5 .10 15 

His Thr lie His Gin Thr Leu Ala Pro Val Val Tyr Val Thr Val Leu 
20 25 30 

25 Val Val Gly Phe Pro Ala Asn Cys Leu Ser Leu Tyr Phe Gly Tyr Leu 

35 40 45 

Gin lie Lys Ala Arg Asn Glu Leu Gly Val Tyr Leu Cys Asn Leu Thr 
50 55 60 

Val Ala Asp Leu Phe Tyr lie Cys Ser Leu Pro Phe Trp Leu Gin Tyr 
30 65 70 75 80 

Val Leu Gin His Asp Asn Trp Ser His Gly Asp Leu Ser Cys Gin Val 
85 90 95 

Cys Gly lie Leu Leu Tyr Glu Asn lie Tyr lie Ser Val Gly Phe Leu 
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100 105 110 

Cys Cys He Ser Val Asp Arg Tyr Leu Ala Val Ala His Pro Phe Arg 
115 120 125 

Phe His Gin Phe Arg Thr Leu Lys Ala Ala Val Gly Val Ser Val Val 
5 130 135 140 

He Trp Ala Lys Glu Leu Leu Thr Ser He Tyr Phe Leu Met His Glu 
145 150 155 160 

Glu Val lie Glu Asp Glu Asn Gin His Arg Val Cys Phe Glu His Tyr 
165 170 175 

10 Pro He Gin Ala Trp Gin Arg Ala He Asn Tyr Tyr Arg Phe Leu Val 

180 185 190 

Gly Phe Leu Phe Pro He Cys Leu Leu Leu Ala Ser Tyr Gin Gly He 
195 200 205 

Leu Arg Ala Val Arg Arg Ser His Gly Thr Gin Lys Ser Arg Lys Asp 
15 210 215 220 

Gin He Lys Arg Leu Val Leu Ser Thr Val Val He Phe Leu Ala Cys 
225 230 235 240 

Phe Leu Pro Tyr His Val Leu Leu Leu Val Arg Ser Val Trp Glu Ala 
245 250 255 

20 Ser Cys Asp Phe Ala Lys Gly Val Phe Asn Ala Tyr His Phe Ser Leu 

260 265 270 

Leu Leu Thr Ser Phe Asn Cys Val Ala Asp Pro Val Leu Tyr Cys Phe 
275 280 285 

Val Ser Glu Thr Thr His Arg Asp Leu Ala Arg Leu Arg Gly Ala Cys 
25 290 295 300 

Leu Ala Phe Leu Thr Cys Ser Arg Thr Gly Arg Ala Arg Glu Ala Tyr 
305 310 315 320 

Pro Leu Gly Ala Pro Glu Ala Ser Gly Lys Ser Gly Ala Gin Gly Glu 
325 330 335 

30 Glu Pro Glu Leu Leu Thr Lys Leu His Pro Ala Phe Gin Thr Pro Asn 

340 345 350 

Ser Pro Gly Ser Gly Gly Phe Pro Thr Gly Arg Leu Ala 
355 360 365 

(228) INFORMATION FOR SEQ ID NO: 227: 

35 (i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 1416 base pairs 

(B) TYPE: nucleic acid 
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(C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 

<ii) MOLECULE TYPE: DNA (genomic) 
(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 227: 

5 ATGGATATTC TTTGTGAAGA AAATACTTCT TTGAGCTCAA CTACGAACTC CCTAATGCAA 6 0 

TTAAATGATG ACAACAGGCT CTACAGTAAT GACTTTAACT CCGGAGAAGC TAACACTTCT 12 0 

GATGCATTTA ACTGGACAGT CGACTCTGAA AATCGAACCA ACCTTTCCTG TGAAGGGTGC 18 0 

CTCTCACCGT CGTGTCTCTC CTTACTTCAT CTCCAGGAAA AAAACTGGTC TGCTTTACTG 24 0 

ACAGCCGTAG TGATTATTCT AACTATTGCT GGAAACATAC TCGTCATCAT GGCAGTGTCC 3 00 

10 CTAGAGAAAA AGCTGCAGAA TGCCACCAAC TATTTCCTGA TGTCACTTGC CATAGCTGAT 36 0 

ATGCTGCTGG GTTTCCTTGT CATGCCCGTG TCCATGTTAA CCATCCTGTA TGGGTACCGG 42 0 

TGGCCTCTGC CGAGCAAGCT TTGTGCAGTC TGGATTTACC TGGACGTGCT CTTCTCCACG 48 0 

GCCTCCATCA TGCACCTCTG CGCCATCTCG CTGGACCGCT ACGTCGC CAT CCAGAATCCC 54 0 

ATCCACCACA GCCGCTTCAA CTCCAGAACT AAGGCATTTC TGAAAATCAT TGCTGTTTGG 6 00 

15 AC CAT AT C AG TAGGTATATC CATGCCAATA CCAGTCTTTG GGCTACAGGA CGATTCGAAG 66 0 

GTCTTTAAGG AGGGGAGTTG CTTACTCGCC GATGATAACT TTGTCCTGAT CGGCTCTTTT 72 0 

GTGTCATTTT TCATTCCCTT AACCATCATG GTGATCACCT ACTTTCTAAC TATCAAGTCA 7 80 

CTCCAGAAAG AAGCTACTTT GTGTGTAAGT GATCTTGGCA CACGGGCCAA ATT AG CTTCT 84 0 

TTCAGCTTCC TCCCTCAGAG TTCTTTGTCT TC AG AAAAG C TCTTCCAGCG GTCGATCCAT 90 0 

20 AGGGAGCCAG GGTCCTACAC AGGCAGGAGG ACTATGCAGT C CAT C AG C AA TGAG CAAAAG 96 0 

GCAAAGAAGG TGCTGGGCAT CGTCTTCTTC CTGTTTGTGG TGATGTGGTG CCCTTTCTTC 102 0 

ATCACAAACA TCATGGCCGT CATCTGCAAA GAGTCCTGCA ATGAGGATGT CATTGGGGCC 10 80 

CTGCTCAATG TGTTTGTTTG GATCGGTTAT CTCTCTTCAG CAGTCAACCC ACTAGTCTAC 114 0 

ACACTGTTCA ACAAGACCTA TAGGTCAGCC TTTTCACGGT ATATTCAGTG TCAGTACAAG 12 00 

25 GAAAACAAAA AACCATTGCA GTTAATTTTA GTGAACACAA TACCGGCTTT GGCCTACAAG 126 0 

TCTAGCCAAC TTCAAATGGG ACAAAAAAAG AATTCAAAGC AAGATGCCAA GACAACAGAT 13 2 0 

AATGACTGCT CAATGGTTGC TCTAGGAAAG CAGTATTCTG AAGAGGCTTC TAAAGACAAT 13 8 0 

AGCGACGGAG TGAATGAAAA GGTGAGCTGT GTGTGA 1416 
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(229) INFORMATION FOR SEQ ID NO: 228: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 4 70 amino acids 

(B) TYPE: amino acid 
5 (C) STRANDEDNESS : 

(D) TOPOLOGY: not relevant 

(ii) MOLECULE TYPE: protein 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 22 8: 

Met Asp lie Leu Cys Glu Glu Asn Thr Ser Leu Ser Ser Thr Thr Asn 
10 1 5 10 is 

Ser Leu Met Gin Leu Asn Asp Asp Asn Arg Leu Tyr Ser Asn Asp Phe 
20 25 30 

Asn Ser Gly Glu Ala Asn Thr Ser Asp Ala Phe Asn Trp Thr Val Asp 
35 40 45 

15 Ser Glu Asn Arg Thr Asn Leu Ser Cys Glu Gly Cys Leu Ser Pro Ser 

50 55 60 

Cys Leu Ser Leu Leu His Leu Gin Glu Lys Asn Trp Ser Ala Leu Leu 
65 70 75 80 

Thr Ala Val Val lie lie Leu Thr lie Ala Gly Asn lie Leu Val lie 
20 85 90 95 

Met Ala Val Ser Leu Glu Lys Lys Leu Gin Asn Ala Thr Asn Tyr Phe 
100 105 no 

Leu Met Ser Leu Ala lie Ala Asp Met Leu Leu Gly Phe Leu Val Met 
115 120 125 

25 Pro Val Ser Met Leu Thr lie Leu Tyr Gly Tyr Arg Trp Pro Leu Pro 

130 135 140 

Ser Lys Leu Cys Ala Val Trp lie Tyr Leu Asp Val Leu Phe Ser Thr 
145 150 155 160 

Ala Ser He Met His Leu Cys Ala He Ser Leu Asp Arg Tyr Val Ala 
30 165 170 175 

He Gin Asn Pro He His His Ser Arg Phe Asn Ser Arg Thr Lys Ala 
180 185 190 

Phe Leu Lys He He Ala Val Trp Thr He Ser Val Gly He Ser Met 
195 200 205 



35 



Pro lie Pro Val Phe Gly Leu Gin Asp Asp Ser Lys Val Phe Lys Glu 

210 215 220 

Gly Ser Cys Leu Leu Ala Asp Asp Asn Phe Val Leu lie Gly Ser Phe 

225 230 235 240 
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Val Ser Phe Phe lie Pro Leu Thr lie Met Val lie Thr Tyr Phe Leu 
245 250 255 

Thr lie Lys Ser Leu Gin Lys Glu Ala Thr Leu Cys Val Ser Asp Leu 
260 265 270 

5 Gly Thr Arg Ala Lys Leu Ala Ser Phe Ser Phe Leu Pro Gin Ser Ser 

275 280 285 

Leu Ser Ser Glu Lys Leu Phe Gin Arg Ser lie His Arg Glu Pro Gly 
290 295 300 

Ser Tyr Thr Gly Arg Arg Thr Met Gin Ser lie Ser Asn Glu Gin Lys 
10 305 310 315 320 

Ala Lys Lys Val Leu Gly lie Val Phe Phe Leu Phe Val Val Met Trp 
325 330 335 

Cys Pro Phe Phe lie Thr Asn lie Met Ala Val lie Cys Lys Glu Ser 
340 345 350 

15 Cys Asn Glu Asp Val lie Gly Ala Leu Leu Asn Val Phe Val Trp lie 

355 360 365 

Gly Tyr Leu Ser Ser Ala Val Asn Pro Leu Val Tyr Thr Leu Phe Asn 
370 375 380 

Lys Thr Tyr Arg Ser Ala Phe Ser Arg Tyr lie Gin Cys Gin Tyr Lys 
20 385 390 395 400 

Glu Asn Lys Lys Pro Leu Gin Leu lie Leu Val Asn Thr lie Pro Ala 
405 410 415 

Leu Ala Tyr Lys Ser Ser Gin Leu Gin Met Gly Gin Lys Lys Asn Ser 
420 425 430 

25 Lys Gin Asp Ala Lys Thr Thr Asp Asn Asp Cys Ser Met Val Ala Leu 

435 440 445 

Gly Lys Gin Tyr Ser Glu Glu Ala Ser Lys Asp Asn Ser Asp Gly Val 
450 455 460 

Asn Glu Lys Val Ser Cys Val 
30 465 470 

(230) INFORMATION FOR SEQ ID NO: 229: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 13 77 base pairs 

(B) TYPE: nucleic acid 
35 (C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 
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(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 22 9: 

ATGGTGAACC TGAGGAATGC GGTGCATTCA TTCCTTGTGC ACCTAATTGG CCTATTGGTT 6 0 

TGGCAATGTG ATATTTCTGT GAGCCCAGTA GCAGCTATAG TAACTGACAT TTTCAATACC 12 0 

TCCGATGGTG GACGCTTCAA ATTCCCAGAC GGGGTACAAA ACTGGCCAGC ACTTTCAATC 18 0 

GTCATCATAA TAATCATGAC AATAGGTGGC AACATCCTTG TGATCATGGC AGTAAGCATG 240 

GAAAAGAAAC TGCACAATGC C AC C AATTAC TTCTTAATGT CCCTAGCCAT TGCTGATATG 3 00 

CTAGTGGGAC TACTTGTCAT GCCCCTGTCT CTCCTGGCAA TCCTTTATGA TTATGTCTGG 360 

CCACTACCTA GATATTTGTG CCCCGTCTGG ATTTCTTTAG ATGTTTTATT TTCAACAGCG 420 

TCCATCATGC ACCTCTGCGC TATATCGCTG GATCGGTATG TAGCAATACG TAATCCTATT 4 80 

GAG CAT AG C C GTTTCAATTC GCGGACTAAG GCCATCATGA AGATTGCTAT TGTTTGGGCA 54 0 

ATTTCTATAG GTGTATCAGT TCCTATCCCT GTGATTGGAC TGAGGGACGA AGAAAAGGTG 6 00 

TTCGTGAACA ACACGACGTG CGTGCTCAAC GACCCAAATT TCGTTCTTAT TGGGTCCTTC 660 

GTAGCTTTCT TCATACCGCT GACGATTATG GTGATTACGT ATTGCCTGAC CATCTACGTT 72 0 

CTGCGCCGAC AAGCTTTGAT GTTACTGCAC GGCCACACCG AGGAACCGCC TGGACTAAGT 78 0 

CTGGATTTCC TGAAGTGCTG CAAGAGGAAT ACGGCCGAGG AAGAGAACTC TGCAAACCCT 84 0 

AACCAAGACC AGAACGCACG CCGAAGAAAG AAGAAGGAGA GACGTCCTAG GGGCACCATG 9 00 

CAGGCTATCA ACAATGAAAG AAAAGCTAAG AAAGTCCTTG GGATTGTTTT CTTTGTGTTT 96 0 

CTGATCATGT GGTGCCCATT TTTCATTACC AATATT CTGT CTGTTCTTTG TGAGAAGTCC 102 0 

TGTAACCAAA AGCTCATGGA AAAGCTTCTG AATG TGTTTG TTTGGATTGG CTATGTTTGT 108 0 

TCAGGAATCA ATCCTCTGGT GTATACTCTG TTCAACAAAA TTTACCGAAG GGCATTCTCC 114 0 

AACTATTTGC GTTGCAATTA TAAGGTAGAG AAAAAGCCTC CTGTCAGGCA GATTCCAAGA 12 00 

GTTGCCGCCA CTG CTTTGTC TGGGAGGGAG CTTAATGTTA ACATTTATCG GCATACCAAT 126 0 

GAACCGGTGA TCGAGAAAGC CAGTGACAAT GAGCCCGGTA T AG AG ATG C A AGTTGAGAAT 132 0 

TTAGAGTTAC CAGTAAATCC CTCCAGTGTG GTTAGCGAAA GG ATT AG C AG TGTGTGA 13 7 7 
(231) INFORMATION FOR SEQ ID NO: 230: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 4 58 amino acids 

(B) TYPE: amino acid 

( C ) STRANDEDNESS : 
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(D) TOPOLOGY: not relevant 
<ii) MOLECULE TYPE: protein 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 230: 

Met Val Asn Leu Arg Asn Ala Val His Ser Phe Leu Val His Leu lie 
5 1 5 10 15 

Gly Leu Leu Val Trp Gin Cys Asp lie Ser Val Ser Pro Val Ala Ala 
20 25 30 

lie Val Thr Asp lie Phe Asn Thr Ser Asp Gly Gly Arg Phe Lys Phe 
35 40 45 

10 Pro Asp Gly Val Gin Asn Trp Pro Ala Leu Ser lie Val lie lie lie 

50 55 60 

lie Met Thr lie Gly Gly Asn He Leu Val He Met Ala Val Ser Met 
65 70 75 80 

Glu Lys Lys Leu His Asn Ala Thr Asn Tyr Phe Leu Met Ser Leu Ala 
15 85 90 95 

He Ala Asp Met Leu Val Gly Leu Leu Val Met Pro Leu Ser Leu Leu 
100 105 110 

Ala He Leu Tyr Asp Tyr Val Trp Pro Leu Pro Arg Tyr Leu Cys Pro 
115 120 125 

20 Val Trp He Ser Leu Asp Val Leu Phe Ser Thr Ala Ser He Met His 

130 135 140 

Leu Cys Ala He Ser Leu Asp Arg Tyr Val Ala He Arg Asn Pro He 
145 150 155 160 

Glu His Ser Arg Phe Asn Ser Arg Thr Lys Ala He Met Lys He Ala 
25 165 170 175 

He Val Trp Ala He Ser He Gly Val Ser Val Pro He Pro Val He 
180 185 190 

Gly Leu Arg Asp Glu Glu Lys Val Phe Val Asn Asn Thr Thr Cys Val 
195 200 205 

30 Leu Asn Asp Pro Asn Phe Val Leu He Gly Ser Phe Val Ala Phe Phe 

210 215 220 

He Pro Leu Thr He Met Val He Thr Tyr Cys Leu Thr He Tyr Val 
225 230 235 240 

Leu Arg Arg Gin Ala Leu Met Leu Leu His Gly His Thr Glu Glu Pro 
35 245 250 255 

Pro Gly Leu Ser Leu Asp Phe Leu Lys Cys Cys Lys Arg Asn Thr Ala 
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260 265 270 

Glu Glu Glu Asn Ser Ala Asn Pro Asn Gin Asp Gin Asn Ala Arg Arg 
275 280 285 

Arg Lys Lys Lys Glu Arg Arg Pro Arg Gly Thr Met Gin Ala lie Asn 
5 290 295 300 

Asn Glu Arg Lys Ala Lys Lys Val Leu Gly lie Val Phe Phe Val Phe 
305 310 315 320 

Leu lie Met Trp Cys Pro Phe Phe He Thr Asn He Leu Ser Val Leu 
325 330 335 

10 Cys Glu Lys Ser Cys Asn Gin Lys Leu Met Glu Lys Leu Leu Asn Val 

340 345 350 

Phe Val Trp He Gly Tyr Val Cys Ser Gly He Asn Pro Leu Val Tyr 
355 360 365 

Thr Leu Phe Asn Lys He Tyr Arg Arg Ala Phe Ser Asn Tyr Leu Arg 
15 370 375 380 

Cys Asn Tyr Lys Val Glu Lys Lys Pro Pro Val Arg Gin He Pro Arg 
385 390 395 400 

Val Ala Ala Thr Ala Leu Ser Gly Arg Glu Leu Asn Val Asn He Tyr 
405 410 415 

20 Arg His Thr Asn Glu Pro Val He Glu Lys Ala Ser Asp Asn Glu Pro 

420 425 430 

Gly He Glu Met Gin Val Glu Asn Leu Glu Leu Pro Val Asn Pro Ser 
435 440 445 

Ser Val Val Ser Glu Arg He Ser Ser Val 
25 450 455 

<232) INFORMATION FOR SEQ ID NO: 231: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 106 8 base pairs 

(B) TYPE : nucleic acid 
30 (C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 231: 

ATGGATCAGT TCCCTGAATC AG TG AC AGAA AACTTTGAGT ACGATGATTT GGCTGAGGCC 60 

35 TGTTATATTG GGGACATCGT GGTCTTTGGG ACTGTGTTCC TGTC CAT ATT CTACTCCGTC 12 0 

ATCTTTGCCA TTGGCCTGGT GGGAAATTTG TTGGTAGTGT TTGCCCTCAC CAACAGCAAG 18 0 
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AAGCCCAAGA GTGTC AC CG A CATTTACCTC CTGAACCTGG CCTTGTCTGA TCTGCTGTTT 240 

GTAGCCACTT TGCCCTTCTG GACTCACTAT TTGATAAATG AAAAGGGCCT CCACAATGCC 3 00 

ATGTGCAAAT TCACTACCGC CTTCTTCTTC ATCGGCTTTT TTGGAAGCAT ATTCTTCATC 36 0 

ACCGTCATCA GCATTGATAG GTACCTGGCC ATCGTCCTGG CCGCCAACTC CATGAACAAC 42 0 

5 CGGACCGTGC AGCATGGCGT CACCATCAGC CTAGGCGTCT GGGCAGCAGC CATTTTGGTG 480 

GCAGCACCCC AGTTCATGTT C AC AAAG C AG AAAGAAAATG AATG CCTTGG TGACTACCCC 54 0 

GAGGTCCTCC AGGAAATCTG GCCCGTGCTC CGCAATGTGG AAACAAATTT TCTTGGCTTC 6 00 

CTACTCCCCC TGCTCATTAT GAG TTATTG C TACTTCAGAA TCATCCAGAC GCTGTTTTCC 66 0 

TGCAAGAACC AC AAG AAAG C CAAAGCCAAG AAACTGATCC TTCTGGTGGT CATCGTGTTT 72 0 

10 TTCCTCTTCT GGACACCCTA CAACGTTATG ATTTTCCTGG AGACGCTTAA GCTCTATGAC 78 0 

TTCTTTCCCA GTTGTGACAT GAGGAAGGAT CTGAGGCTGG CCCTCAGTGT GACTGAGACG 84 0 

GTTGCATTTA GCCATTGTTG CCTGAATCCT CTCATCTATG CATTTGCTGG GGAGAAGTTC 90 0 

AGAAGATACC TTTACCACCT GTATGGGAAA TGCCTGGCTG TCCTGTGTGG GCGCTCAGTC 96 0 

CACGTTGATT TCTCCTCATC TGAATCACAA AGGAGCAGGC ATGGAAGTGT TCTGAGCAGC 102 0 

15 AATTTTACTT ACCACACGAG TGATGGAGAT GCATTGCTCC TTCTCTGA 1068 
(233) INFORMATION FOR SEQ ID NO:232: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 35 5 amino acids 

(B) TYPE: amino acid 
20 (C) STRANDEDNESS : 

(D) TOPOLOGY: not relevant 

<ii) MOLECULE TYPE: protein 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO:232: 

Met Asp Gin Phe Pro Glu Ser Val Thr Glu Asn Phe Glu Tyr Asp Asp 
25 1 5 10 15 

Leu Ala Glu Ala Cys Tyr lie Gly Asp lie Val Val Phe Gly Thr Val 
20 25 30 

Phe Leu Ser lie Phe Tyr Ser Val lie Phe Ala lie Gly Leu Val Gly 
35 40 45 

30 Asn Leu Leu Val Val Phe Ala Leu Thr Asn Ser Lys Lys Pro Lys Ser 

50 55 60 

Val Thr Asp lie Tyr Leu Leu Asn Leu Ala Leu Ser Asp Leu Leu Phe 



BNSDOCID: <WO 0022129A 1 _l_> 



WO 00/22129 PCT/US99/23938 

200 

65 70 75 so 

Val Ala Thr Leu Pro Phe Trp Thr His Tyr Leu He Asn Glu Lys Gly 
85 90 95 

Leu His Asn Ala Met Cys Lys Phe Thr Thr Ala Phe Phe Phe He Gly 
5 100 105 no 

Phe Phe Gly Ser He Phe Phe He Thr Val lie Ser He Asp Arg Tyr 
115 120 125 

Leu Ala He Val Leu Ala Ala Asn Ser Met Asn Asn Arg Thr Val Gin 
130 135 140 

10 His Gly Val Thr He Ser Leu Gly Val Trp Ala Ala Ala He Leu Val 

145 150 155 160 

Ala Ala Pro Gin Phe Met Phe Thr Lys Gin Lys Glu Asn Glu Cys Leu 
165 170 175 

Gly Asp Tyr Pro Glu Val Leu Gin Glu He Trp Pro Val Leu Arg Asn 
15 180 185 190 

Val Glu Thr Asn Phe Leu Gly Phe Leu Leu Pro Leu Leu He Met Ser 
195 200 205 

Tyr Cys Tyr Phe Arg He He Gin Thr Leu Phe Ser Cys Lys Asn His 
210 215 220 



20 



25 



30 



35 



Lys Lys Ala Lys Ala Lys Lys Leu He Leu Leu Val Val He Val Phe 
225 230 235 240 

Phe Leu Phe Trp Thr Pro Tyr Asn Val Met He Phe Leu Glu Thr Leu 
245 250 255 

Lys Leu Tyr Asp Phe Phe Pro Ser Cys Asp Met Arg Lys Asp Leu Arg 
260 265 270 

Leu Ala Leu Ser Val Thr Glu Thr Val Ala Phe Ser His Cys Cys Leu 
275 280 285 

Asn Pro Leu He Tyr Ala Phe Ala Gly Glu Lys Phe Arg Arg Tyr Leu 
290 295 300 

Tyr His Leu Tyr Gly Lys Cys Leu Ala Val Leu Cys Gly Arg Ser Val 
305 310 315 320 

His Val Asp Phe Ser Ser Ser Glu Ser Gin Arg Ser Arg His Gly Ser 
325 330 335 

Val Leu Ser Ser Asn Phe Thr Tyr His Thr Ser Asp Gly Asp Ala Leu 
340 345 350 

Leu Leu Leu 
355 
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(234) INFORMATION FOR SEQ ID NO:233: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 2 9 base pairs 

(B) TYPE: nucleic acid 

5 <C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 
(iv) ANTI- SENSE: NO 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO:233: 
10 GGCTTAAGAG CATCATCGTG GTGCTGGTG 29 

(235) INFORMATION FOR SEQ ID NO:234: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 34 base pairs 

(B) TYPE: nucleic acid 
15 (C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 
(iv) ANTI -SENSE: YES 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO:234: 
20 GTCACCACCA GCACCACGAT GATGCTCTTA AGCC 34 

(236) INFORMATION FOR SEQ ID NO: 23 5: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 31 base pairs 

(B) TYPE: nucleic acid 
25 (C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO:235: 

CAAAGAAAGT ACTGGGCATC GTCTTCTTCC T 31 
30 (237) INFORMATION FOR SEQ ID NO:236: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 3 0 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 
35 (D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 
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(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 236: 
TGCTCTAGAT TCCAGATAGG TGAAAACTTG 
(238) INFORMATION FOR SEQ ID NO. 237: 

(i) SEQUENCE CHARACTERISTICS: 
5 (A) LENGTH: 50 base pairs 

<B) TYPE: nucleic acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 
10 (iv) ANT I- SENSE: NO 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO:237: 
CTAGGGGCAC CATGCAGGCT ATCAACAATG AAAGAAAAGC TAAGAAAGTC 

(239) INFORMATION FOR SEQ ID NO: 238: 

(i) SEQUENCE CHARACTERISTICS: 
15 (A) LENGTH: 50 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 
20 (iv) ANTI- SENSE: YES 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 23 8: 
CAAGGACTTT CTTAGCTTTT CTTTCATTGT TGATAGCCTG CATGGTGCCC 

(240) INFORMATION FOR SEQ ID NO: 239: 

(i) SEQUENCE CHARACTERISTICS: 
25 (A) LENGTH: 3 5 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS ; single 

( D ) TOPOLOGY : 1 inear 

(ii) MOLECULE TYPE: DNA (genomic) 

30 (xi) SEQUENCE DESCRIPTION: SEQ ID NO: 23 9: 

CGGCGG C AG A AGGCGAAACG CATGATCCTC GCGGT 

(241) INFORMATION FOR SEQ ID NO:240: 

(i) SEQUENCE CHARACTERISTICS: 
(A) LENGTH: 35 base pairs 
35 (B) TYPE: nucleic acid 
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50 
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(C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 

<ii) MOLECULE TYPE: DNA (genomic) 
(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 24 0: 
5 ACCGCGAGGA TCATGCGTTT CGCCTTCTGC CGCCG 3 5 

(242) INFORMATION FOR SEQ ID NO: 241: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 24 base pairs 

(B) TYPE: nucleic acid 
10 (C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO:241: 
GAGACATATT ATCTGCCACG GAGG 24 
15 (243) INFORMATION FOR SEQ ID NO: 242: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 24 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 
20 (D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO:242: 
TTGGCATAGA AACCGGACCC AAGG 24 

(244) INFORMATION FOR SEQ ID NO:243: 

25 (i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 2 8 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 

30 (ii) MOLECULE TYPE: DNA (genomic) 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO:243: 
TAAGAATTCC ATAAAAATTA TGGAATGG 2 8 

(245) INFORMATION FOR SEQ ID NO: 24 4: 
(i) SEQUENCE CHARACTERISTICS: 
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(A) LENGTH: 30 base pairs 

(B) TYPE: nucleic acid 
<C) STRANDEDNESS : single 
(D) TOPOLOGY: linear 

5 (ii) MOLECULE TYPE: DNA (genomic) 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 244: 

CCAGGATCCA GCTGAAGTCT TCCATCATTC 

(246) INFORMATION FOR SEQ ID NO: 245: 

(i) SEQUENCE CHARACTERISTICS: 
10 (A) LENGTH: 1071 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 

15 (xi) SEQUENCE DESCRIPTION: SEQ ID NO: 24 5: 

ATGAATGGGG TCTCGGAGGG GACCAGAGGC TGCAGTGACA GGCAACCTGG GGTCCTGACA 
CGTGATCGCT CTTGTTCCAG GAAGATGAAC TCTTCCGGAT GCCTGTCTGA GGAGGTGGGG 
TCCCTCCGCC CACTGACTGT GGTTATCCTG TCTGCGTCCA TTGTCGTCGG AGTGCTGGGC 
AATGGGCTGG TGCTGTGGAT GACTGTCTTC CGTATGG CAC GCACGGTCTC CACCGTCTGC 
20 TTCTTCCACC TGGCCCTTGC CGATTTCATG CTCTCACTGT CTCTGCCCAT TGCCATGTAC 
TATATTGTCT CCAGGCAGTG GCTCCTCGGA GAGTGGGCCT GCAAACTCTA CATCACCTTT 
GTGTTCCTCA GCTACTTTGC CAGTAACTGC CTCCTTGTCT TCATCTCTGT GGACCGTTGC 
ATCTCTGTCC TCTACCCCGT CTGGGCCCTG AACCACCGCA CTGTGCAGCG GGCGAGCTGG 
CTGGCCTTTG GGGTGTGGCT CCTGGCCGCC GCCTTGTGCT CTGCGCACCT GAAATTCCGG 
25 ACAACCAGAA AATGGAATGG CTGTACGCAC TGCTACTTGG CGTTCAACTC TGACAATGAG 
ACTGCCCAGA TTTGGATTGA AGGGGTCGTG GAGGGACACA TTATAGGGAC CATTGGCCAC 
TTCCTGCTGG GCTTCCTGGG GCCCTTAGCA ATCATAGGCA CCTGCGCCCA CCTCATCCGG 
GCCAAGCTCT TGCGGGAGGG CTGGGTCCAT GCCAACCGGC CCGCGAGGCT GCTGCTGGTG 
CTGGTGAGCG CTTTCTTTAT CTTCTGGTCC CCGTTTAACG TGGTGCTGTT GGTCCATCTG 
30 TGGCGACGGG TGATGCTCAA GGAAATCTAC CACCCCCGGA TGCTGCTCAT CCTCCAGGCT 
AGCTTTGCCT TGGGCTGTGT CAACAGCAGC CTCAACCCCT TCCTCTACGT CTTCGTTGGC 



30 



60 
120 
180 
240 
300 
360 
420 
480 
540 
600 
660 
720 
780 
840 
900 
960 
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AGAGATTTCC AAGAAAAGTT TTTCCAGTCT TTGACTTCTG CCCTGGCGAG GGCGTTTGGA 102 0 
GAGGAGGAGT TTCTGTCATC CTGTCCCCGT GGCAACGCCC CCCGGGAATG A 1071 
(247) INFORMATION FOR SEQ ID NO: 246: 

(i) SEQUENCE CHARACTERISTICS: 

5 (A) LENGTH: 356 amino acids 

(B) TYPE: amino acid 

(C) STRANDEDNESS : 

(D) TOPOLOGY: not relevant 

(ii) MOLECULE TYPE: protein 

10 <xi) SEQUENCE DESCRIPTION: SEQ ID NO: 246: 

Met Asn Gly Val Ser Glu Gly Thr Arg Gly Cys Ser Asp Arg Gin Pro 
15 10 15 

Gly Val Leu Thr Arg Asp Arg Ser Cys Ser Arg Lys Met Asn Ser Ser 
20 25 30 

15 Gly Cys Leu Ser Glu Glu Val Gly Ser Leu Arg Pro Leu Thr Val Val 

35 40 45 

lie Leu Ser Ala Ser lie Val Val Gly Val Leu Gly Asn Gly Leu Val 
50 55 60 

Leu Trp Met Thr Val Phe Arg Met Ala Arg Thr Val Ser Thr Val Cys 
20 65 70 75 80 

Phe Phe His Leu Ala Leu Ala Asp Phe Met Leu Ser Leu Ser Leu Pro 
85 90 95 

He Ala Met Tyr Tyr He Val Ser Arg Gin Trp Leu Leu Gly Glu Trp 
100 105 110 

25 Ala Cys Lys Leu Tyr He Thr Phe Val Phe Leu Ser Tyr Phe Ala Ser 

115 120 125 

Asn Cys Leu Leu Val Phe lie Ser Val Asp Arg Cys lie Ser Val Leu 
130 135 140 

Tyr Pro Val Trp Ala Leu Asn His Arg Thr Val Gin Arg Ala Ser Trp 
30 145 150 155 160 

Leu Ala Phe Gly Val Trp Leu Leu Ala Ala Ala Leu Cys Ser Ala His 
165 170 175 

Leu Lys Phe Arg Thr Thr Arg Lys Trp Asn Gly Cys Thr His Cys Tyr 
180 185 190 

35 Leu Ala Phe Asn Ser Asp Asn Glu Thr Ala Gin lie Trp He Glu Gly 

195 200 205 
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Val Val Glu Gly His He He Gly Thr He Gly His Phe Leu Leu Gly 
210 215 220 

Phe Leu Gly Pro Leu Ala He He Gly Thr Cys Ala His Leu He Arg 
225 230 235 240 

5 Ala Lys Leu Leu Arg Glu Gly Trp Val His Ala Asn Arg Pro Ala Arg 

245 250 255 

Leu Leu Leu Val Leu Val Ser Ala Phe Phe He Phe Trp Ser Pro Phe 
260 265 270 

Asn Val Val Leu Leu Val His Leu Trp Arg Arg Val Met Leu Lys Glu 
10 275 280 285 

He Tyr His Pro Arg Met Leu Leu He Leu Gin Ala Ser Phe Ala Leu 
290 295 300 

Gly Cys Val Asn Ser Ser Leu Asn Pro Phe Leu Tyr Val Phe Val Gly 
305 310 315 320 

15 Arg Asp Phe Gin Glu Lys Phe Phe Gin Ser Leu Thr Ser Ala Leu Ala 

325 330 335 

Arg Ala Phe Gly Glu Glu Glu Phe Leu Ser Ser Cys Pro Arg Gly Asn 
340 345 350 

Ala Pro Arg Glu 
20 355 

(2 48) INFORMATION FOR SEQ ID NO: 24 7: 

(i) SEQUENCE CHARACTERISTICS : 

(A) LENGTH: 32 base pairs 

(B) TYPE: nucleic acid 
25 (C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 247: 
GCAGAATTCG GCGGCCCCAT GGACCTGCCC CC 3 2 

30 (249) INFORMATION FOR SEQ ID NO:248: 

(i) SEQUENCE CHARACTERISTICS : 

(A) LENGTH: 30 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 
35 (D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO:248: 
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GCTGGATCCC CCGAGCAGTG GCGTTACTTC 3 0 

(250) INFORMATION FOR SEQ ID NO: 24 9: 

(i) SEQUENCE CHARACTERISTICS: 
(A) LENGTH: 903 base pairs 

5 (B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 249: 

10 ATGGACCTGC CCCCGCAGCT CTCCTTCGGC CTCTATGTGG CCGCCTTTGC GCTGGGCTTC 6 0 

CCGCTCAACG TCCTGGCCAT CCGAGGCGCG ACGGCCCACG CCCGGCTCCG TCTCACCCCT 12 0 

AGCCTGGTCT ACGCCCTGAA CCTGGGCTGC TCCGACCTGC TGCTGACAGT CTCTCTGCCC 18 0 

CTGAAGG CGG TGGAGGCGCT AGCCTCCGGG GCCTGGCCTC TGCCGGCCTC GCTGTGCCCC 24 0 

GTCTTCGCGG TGGCCCACTT CTTCCCACTC TATGCCGGCG GGGGCTTCCT GGCCGCCCTG 3 00 

15 AGTGCAGGCC GCTACCTGGG AGCAGCCTTC CCCTTGGGCT ACCAAGCCTT CCGGAGGCCG 36 0 

TGCTATTCCT GGGGGGTGTG CGCGGCCATC TGGGCCCTCG TCCTGTGTCA CCTGGGTCTG 42 0 

GTCTTTGGGT TGGAGGCTCC AGGAGGCTGG CTGGACCACA GCAACACCTC CCTGGGCATC 48 0 

AACACAC CGG TCAACGGCTC TCCGGTCTGC CTGGAGGCCT GGGACCCGGC CTCTGCCGGC 54 0 

CCGGCCCGCT TCAGCCTCTC TCTCCTGCTC TTTTTTCTGC CCTTGGCCAT CACAGCCTTC 6 00 

20 TGCTACGTGG GCTGCCTCCG GGCACTGGCC CGCTCCGGCC TGACGCACAG GCGGAAGCTG 66 0 

CGGGCCGCCT GGGTGGCCGG CGGGGCCCTC CTCACGCTGC TGCTCTGCGT AGGACCCTAC 72 0 

AACGCCTCCA ACGTGG CCAG CTTCCTGTAC CCCAATCTAG GAGGCTCCTG GCGGAAGCTG 78 0 

GGGCTCATCA CGGGTGCCTG GAGTGTGGTG CTTAATCCGC TGGTGACCGG TTACTTGGGA 84 0 

AGGGGTCCTG GCCTGAAGAC AGTGTGTGCG GCAAGAACGC AAGGGGGCAA GTCCCAGAAG 9 00 

25 TAA 903 

(251) INFORMATION FOR SEQ ID NO: 2 50: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 3 00 amino acids 

(B) TYPE: amino acid 
30 (C) STRANDEDNESS: 

(D) TOPOLOGY: not relevant 
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(ii) MOLECULE TYPE: protein 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 2 50: 

Met Asp Leu Pro Pro Gin Leu Ser Phe Gly Leu Tyr Val Ala Ala Phe 
15 10 15 

5 Ala Leu Gly Phe Pro Leu Asn Val Leu Ala lie Arg Gly Ala Thr Ala 

20 25 30 

His Ala Arg Leu Arg Leu Thr Pro Ser Leu Val Tyr Ala Leu Asn Leu 
35 40 45 

Gly Cys Ser Asp Leu Leu Leu Thr Val Ser Leu Pro Leu Lys Ala Val 
0 50 55 60 

Glu Ala Leu Ala Ser Gly Ala Trp Pro Leu Pro Ala Ser Leu Cys Pro 
65 70 75 80 

Val Phe Ala Val Ala His Phe Phe Pro Leu Tyr Ala Gly Gly Gly Phe 
85 90 95 

5 Leu Ala Ala Leu Ser Ala Gly Arg Tyr Leu Gly Ala Ala Phe Pro Leu 

100 105 no 

Gly Tyr Gin Ala Phe Arg Arg Pro Cys Tyr Ser Trp Gly Val Cys Ala 
115 120 125 

Ala lie Trp Ala Leu Val Leu Cys His Leu Gly Leu Val Phe Gly Leu 
130 135 140 

Glu Ala Pro Gly Gly Trp Leu Asp His Ser Asn Thr Ser Leu Gly lie 
145 150 155 160 

Asn Thr Pro Val Asn Gly Ser Pro Val Cys Leu Glu Ala Trp Asp Pro 
165 170 175 

Ala Ser Ala Gly Pro Ala Arg Phe Ser Leu Ser Leu Leu Leu Phe Phe 
180 185 190 

Leu Pro Leu Ala lie Thr Ala Phe Cys Tyr Val Gly Cys Leu Arg Ala 
195 200 205 

Leu Ala Arg Ser Gly Leu Thr His Arg Arg Lys Leu Arg Ala Ala Trp 
210 215 220 

Val Ala Gly Gly Ala Leu Leu Thr Leu Leu Leu Cys Val Gly Pro Tyr 
225 230 235 240 

Asn Ala Ser Asn Val Ala Ser Phe Leu Tyr Pro Asn Leu Gly Gly Ser 
245 250 255 

Trp Arg Lys Leu Gly Leu lie Thr Gly Ala Trp Ser Val Val Leu Asn 
260 265 270 



Pro Leu Val Thr Gly Tyr Leu Gly Arg Gly Pro Gly Leu Lys Thr Val 
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275 280 285 

Cys Ala Ala Arg Thr Gin Gly Gly Lys Ser Gin Lys 
290 295 300 

(252) INFORMATION FOR SEQ ID NO: 251: 

5 (i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 31 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

10 (ii) MOLECULE TYPE: DNA (genomic) 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 251: 
CTCAAGCTTA CTCTCTCTCA CCAGTGGCCA C 31 

(253) INFORMATION FOR SEQ ID NO: 252: 

(i) SEQUENCE CHARACTERISTICS: 
15 (A) LENGTH : 24 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 

20 (xi) SEQUENCE DESCRIPTION: SEQ ID NO:252: 

CCCTCCTCCC CCGGAGGACC TAGC 24 

(254) INFORMATION FOR SEQ ID NO:253: 

( i ) SEQUENCE CHARACTERISTICS : 
(A) LENGTH: 1041 base pairs 

25 (B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

( D ) TOPOLOGY : 1 inear 

(ii) MOLECULE TYPE: DNA (genomic) 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO:253: 

30 ATGGATACAG GCCCCGACCA GTCCTACTTC TCCGGCAATC ACTGGTTCGT CTTCTCGGTG 60 

TACCTTCTCA CTTTCCTGGT GGGGCTCCCC CTCAACCTGC TGGCCCTGGT GGTCTTCGTG 12 0 

GGCAAGCTGC AGCGCCGCCC GGTGGCCGTG GACGTGCTCC TGCTCAACCT GACCGCCTCG 180 

GACCTGCTCC TGCTGCTGTT CCTGCCTTTC CGCATGGTGG AGGCAGCCAA TGGCATGCAC 24 0 

TGGCCCCTGC CCTTCATCCT CTGCCCACTC TCTGGATTCA TCTTCTTCAC CACCATCTAT 3 00 
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CTCACCGCCC TCTTCCTGGC AGCTGTGAGC ATTGAACGCT TCCTGAGTGT GGCCCACCCA 36 0 

CTGTGGTACA AGACCCGGCC GAGGCTGGGG CAGGCAGGTC TGGTGAGTGT GGCCTGCTGG 42 0 

CTGTTGGCCT CTGCTCACTG CAGCGTGGTC TACGTCATAG AATTCTCAGG GGACATCTCC 480 

CACAGCCAGG GCACCAATGG GACCTGCTAC CTGGAGTTCC GGAAGGACCA GCTAGCCATC 54 0 

5 CTCCTGCCCG TGCGGCTGGA GATGGCTGTG GTCCTCTTTG TGGTCCCGCT GATCATCACC 600 

AGCTACTGCT ACAGCCGCCT GGTGTGGATC CTCGGCAGAG GGGGCAGCCA CCGCCGGCAG 66 0 

AGGAGGGTGG CGGGGCTGTT GGCGGC CACG CTGCTCAACT TCCTTGTCTG CTTTGGGCCC 72 0 

TACAACGTGT CCCATGTCGT GGGCTATATC TGCGGTGAAA GCCCGGCATG GAGGATCTAC 78 0 

GTGACGCTTC TCAGCACCCT GAACTCCTGT GTCGACCCCT TTGTCTACTA CTTCTCCTCC 84 0 

0 TCCGGGTTCC AAGCCGACTT TCATGAGCTG CTGAGGAGGT TGTGTGGGCT CTGGGGCCAG 90 0 

TGGCAGCAGG AGAGCAGCAT GGAGCTGAAG GAGCAGAAGG GAGGGGAGGA GCAGAGAGCG 96 0 

GACCGAC CAG CTGAAAGAAA GACCAGTGAA CACTCACAGG GCTGTGGAAC TGGTGGCCAG 102 0 

GTGGCCTGTG CTGAAAGCTA G 1041 
(255) INFORMATION FOR SEQ ID NO:254: 

5 (i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 346 amino acids 

(B) TYPE: amino acid 

( C ) STRANDEDNESS : 

(D) TOPOLOGY: not relevant 

0 (ii) MOLECULE TYPE: protein 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO:254: 

Met Asp Thr Gly Pro Asp Gin Ser Tyr Phe Ser Gly Asn His Trp Phe 
15 10 is 



5 



0 



Val Phe Ser Val Tyr Leu Leu Thr Phe Leu Val Gly Leu Pro Leu Asn 
20 25 30 

Leu Leu Ala Leu Val Val Phe Val Gly Lys Leu Gin Arg Arg Pro Val 
35 40 45 

Ala Val Asp Val Leu Leu Leu Asn Leu Thr Ala Ser Asp Leu Leu Leu 
50 55 60 

Leu Leu Phe Leu Pro Phe Arg Met Val Glu Ala Ala Asn Gly Met His 
65 70 75 80 

Trp Pro Leu Pro Phe lie Leu Cys Pro Leu Ser Gly Phe lie Phe Phe 
85 90 95 
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Thr Thr lie Tyr Leu Thr Ala Leu Phe Leu Ala Ala Val Ser He Glu 
100 105 110 

Arg Phe Leu Ser Val Ala His Pro Leu Trp Tyr Lys Thr Arg Pro Arg 
115 12C 125 

5 Leu Gly Gin Ala Gly Leu Val Ser Val Ala Cys Trp Leu Leu Ala Ser 

130 135 140 

Ala His Cys Ser Val Val Tyr Val He Glu Phe Ser Gly Asp lie Ser 
145 150 155 160 

His Ser Gin Gly Thr Asn Gly Thr Cys Tyr Leu Glu Phe Arg Lys Asp 
10 165 170 175 

Gin Leu Ala He Leu Leu Pro Val Arg Leu Glu Met Ala Val Val Leu 
180 185 190 

Phe Val Val Pro Leu He He Thr Ser Tyr Cys Tyr Ser Arg Leu Val 
195 200 205 

15 Trp He Leu Gly Arg Gly Gly Ser His Arg Arg Gin Arg Arg Val Ala 

210 215 220 

Gly Leu Leu Ala Ala Thr Leu Leu Asn Phe Leu Val Cys Phe Gly Pro 
225 230 235 240 

Tyr Asn Val Ser His Val Val Gly Tyr He Cys Gly Glu Ser Pro Ala 
20 245 250 255 

Trp Arg He Tyr Val Thr Leu Leu Ser Thr Leu Asn Ser Cys Val Asp 
260 265 270 

Pro Phe Val Tyr Tyr Phe Ser Ser Ser Gly Phe Gin Ala Asp Phe His 
275 280 285 

25 Glu Leu Leu Arg Arg Leu Cys Gly Leu Trp Gly Gin Trp Gin Gin Glu 

290 295 300 

Ser Ser Met Glu Leu Lys Glu Gin Lys Gly Gly Glu Glu Gin Arg Ala 
305 310 315 320 

Asp Arg Pro Ala Glu Arg Lys Thr Ser Glu His Ser Gin Gly Cys Gly 
30 325 330 335 

Thr Gly Gly Gin Val Ala Cys Ala Glu Ser 
340 345 

(256) INFORMATION FOR SEQ ID NO: 2 55: 

(i) SEQUENCE CHARACTERISTICS: 
35 (A) LENGTH: 31 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 
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(ii) MOLECULE TYPE: DNA (genomic) 
(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 255: 
TTTAAGCTTC CCCTCCAGGA TGCTGCCGGA C 3l 

(257) INFORMATION FOR SEQ ID NO: 256: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 31 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : single 
CD) TOPOLOGY: not relevant 

(ii) MOLECULE TYPE: DNA (genomic) 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO:256: 
GGCGAATTCT GAAGGTCCAG GGAAACTGCT A 3l 

(258) INFORMATION FOR SEQ ID NO: 2 57: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 993 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

( D ) TOPOLOGY : 1 inear 

(ii) MOLECULE TYPE: DNA (genomic) 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 2 57: 

ATGCTGCCGG ACTGGAAGAG CTCCTTGATC CTCATGGCTT ACATCATCAT CTTCCTCACT 6 0 

GGCCTCCCTG CCAACCTCCT GGCCCTGCGG GCCTTTGTGG GGCGGATCCG CCAGCCCCAG 12 0 

CCTGCACCTG TGCACATCCT CCTGCTGAGC CTGACGCTGG CCGACCTCCT CCTGCTGCTG 18 0 

CTGCTGCCCT TCAAGATCAT CGAGGCTGCG TCGAACTTCC GCTGGTACCT GCCCAAGGTC 240 

GTCTGCGCCC TCACGAGTTT TGGCTTCTAC AGCAG CATCT ACTGCAGCAC GTGGCTCCTG 3 00 

GCGGGCATCA GCATCGAGCG CTACCTGGGA GTGGCTTTCC CCGTGCAGTA CAAGCTCTCC 360 

CGCCGGCCTC TGTATGGAGT GATTGCAGCT CTGGTGGCCT GGGTTATGTC CTTTGGTCAC 420 

TGCACCATCG TGATCATCGT TCAATACTTG AACACGACTG AGCAGGTCAG AAGTGGCAAT 480 

GAAATTACCT GCTACGAGAA CTTCACCGAT AACCAGTTGG ACGTGGTGCT GCCCGTGCGG 54 0 

CTGGAGCTGT GCCTGGTGCT CTTCTTCATC CCCATGG CAG TCACCATCTT CTGCTACTGG 600 

CGTTTTGTGT GGATCATGCT CTCCCAGCCC CTTGTGGGGG CCCAGAGGCG GCGCCGAGCC 66 0 

GTGGGGCTGG CTGTGGTGAC GCTGCTCAAT TTCCTGGTGT GCTTCGGACC TTACAACGTG 72 0 
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TCCCACCTGG TGGGGTATCA CCAGAGAAAA AGCCCCTGGT GGCGGTCAAT AGCCGTGGTG 78 0 

TTCAGTTCAC TCAACGCCAG TCTGGACCCC CTGCTCTTCT ATTTCTCTTC TTCAGTGGTG 84 0 

CGCAGGGCAT TTGGGAGAGG GCTGCAGGTG CTGCGGAATC AGGGCTCCTC CCTGTTGGGA 90 0 

CGCAGAGGCA AAGACACAGC AGAGGGGACA AATGAGGACA GGGGTGTGGG TCAAGGAGAA 96 0 

GGGATG CCAA GTTCGGACTT CACTACAGAG TAG 993 
(259) INFORMATION FOR SEQ ID NO: 258: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 362 amino acids 

(B) TYPE: amino acid 

(C) STRANDEDNESS : 

(D) TOPOLOGY: not relevant 

(ii) MOLECULE TYPE: protein 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO:258: 

Met Leu Pro Asp Trp Lys Ser Ser Leu lie Leu Met Ala Tyr lie lie 
15 10 15 

lie Phe Leu Thr Gly Leu Pro Ala Asn Leu Leu Ala Leu Arg Ala Phe 
20 25 30 

Val Gly Arg lie Arg Gin Pro Gin Pro Ala Pro Val His lie Leu Leu 
35 40 45 

Leu Ser Leu Thr Leu Ala Asp Leu Leu Leu Leu Leu Leu Leu Pro Phe 
50 55 60 

Lys lie lie Glu Ala Ala Ser Asn Phe Arg Trp Tyr Leu Pro Lys Val 
65 70 75 80 

Val Cys Ala Leu Thr Ser Phe Gly Phe Tyr Ser Ser lie Tyr Cys Ser 
85 90 95 

Thr Trp Leu Leu Ala Gly lie Ser lie Glu Arg Tyr Leu Gly Val Ala 
100 105 110 

Phe Pro Val Gin Tyr Lys Leu Ser Arg Arg Pro Leu Tyr Gly Val lie 
115 120 125 

Ala Ala Leu Val Ala Trp Val Met Ser Phe Gly His Cys Thr lie Val 
130 135 140 

lie lie Val Gin Tyr Leu Asn Thr Thr Glu Gin Val Arg Ser Gly Asn 
145 150 155 160 

Glu lie Thr Cys Tyr Glu Asn Phe Thr Asp Asn Gin Leu Asp Val Val 
165 170 175 
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Leu Pro Val Arg Leu Glu Leu Cys Leu Val Leu Phe Phe He Pro Met 
180 185 190 

Ala Val Thr He Phe Cys Tyr Trp Arg Phe Val Trp He Met Leu Ser 
195 200 205 

5 Gin Pro Leu Val Gly Ala Gin Arg Arg Arg Arg Ala Val Gly Leu Ala 

210 215 220 

Val Val Thr Leu Leu Asn Phe Leu Val Cys Phe Gly Pro Tyr Asn Val 
225 230 235 240 

Ser His Leu Val Gly Tyr His Gin Arg Lys Ser Pro Trp Trp Arg Ser 
10 245 250 255 

He Ala Val Val Phe Ser Ser Leu Asn Ala Ser Leu Asp Pro Leu Leu 
260 265 270 

Phe Tyr Phe Ser Ser Ser Val Val Arg Arg Ala Phe Gly Arg Gly Leu 
2 ?5 280 285 

15 Gin Val Leu Arg Asn Gin Gly Ser Ser Leu Leu Gly Arg Arg Gly Lys 

2 90 295 300 



20 



Asp Thr Ala Glu Gly Thr Asn Glu Asp Arg Gly Val Gly Gin Gly Glu 
305 310 315 320 

Gly Met Pro Ser Ser Asp Phe Thr Thr Glu 
325 330 



(260) INFORMATION FOR SEQ ID NO: 259: 

(i) SEQUENCE CHARACTERISTICS; 

(A) LENGTH: 3 0 base pairs 

(B) TYPE; nucleic acid 
25 (C) STRANDEDNESS : single 

( D ) TOPOLOGY : 1 inear 

(ii) MOLECULE TYPE: DNA (genomic) 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 2 59: 
CCCAAGCTTC GGGCACCATG GACACCTCCC 
30 (261) INFORMATION FOR SEQ ID NO: 260: 



(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 3 0 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 
35 (D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 260: 
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AC AGG AT C C A AATGCACAGC ACTGGTAAGC 3 0 

(262) INFORMATION FOR SEQ ID NO: 261: 

(i) SEQUENCE CHARACTERISTICS: 
(A) LENGTH: 25 base pairs 
5 (B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

( D ) TOPOLOGY : 1 inear 

<ii) MOLECULE TYPE: DNA (genomic) 
(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 261: 
10 CTATAACTGG GTTACATGGT TTAAC 2 5 

(263) INFORMATION FOR SEQ ID NO: 26 2: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 3 0 base pairs 

(B) TYPE: nucleic acid 
15 (C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO:262: 
TTTGAATTCA CATATTAATT AGAGACATGG 30 
20 (264) INFORMATION FOR SEQ ID NO:263: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 2 724 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 
25 (D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO:263: 

ATGGACACCT CCCGGCTCGG TGTGCTCCTG TCCTTGCCTG TGCTGCTGCA GCTGGCGACC 60 

GGGGGCAGCT CTCCCAGGTC TGGTGTGTTG CTGAGGGGCT GCCCCACACA CTGTCATTGC 12 0 

30 GAGCCCGACG GCAGGATGTT GCTCAGGGTG GACTGCTCCG ACCTGGGGCT CTCGGAGCTG 18 0 

CCTTCCAACC TCAGCGTCTT CACCTCCTAC CTAGACCTCA GTATGAACAA CATCAGTCAG 24 0 

CTGCTCCCGA ATCCCCTGCC CAGTCTCCGC TTCCTGGAGG AGTTACGTCT TGCGGGAAAC 3 00 

GCTCTGACAT ACATTCCCAA GGGAGCATTC ACTGG CCTTT ACAGTCTTAA AGTTCTTATG 3 60 
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CTGCAGAATA ATCAGCTAAG ACACGTACCC ACAGAAGCTC TGCAGAATTT GCGAAGCCTT 42 0 

CAATCCCTGC GTCTGGATGC TAACCACATC AGCTATGTGC CCCCAAGCTG TTTCAGTGGC 48 0 

CTGCATTCCC TGAGGCACCT GTGGCTGGAT GACAATGCGT TAACAGAAAT CCCCGTCCAG 54 0 

GCTTTTAGAA GTTTATCGGC ATTGCAAGCC ATGACCTTGG CCCTGAACAA AATAC AC C AC 600 

5 ATACCAGACT ATGCCTTTGG AAACCTCTCC AGCTTGGTAG TTCTACATCT CCATAACAAT 66 0 

AGAATCCACT CCCTGGGAAA GAAATGCTTT GATGGGCTCC ACAGC CTAGA GACTTTAGAT 72 0 

TTAAATTACA ATAACCTTGA TGAATTCCCC ACTGCAATTA GGACACTCTC CAACCTTAAA 78 0 

GAACTAGGAT TTCATAGCAA CAATATCAGG TCGATACCTG AGAAAGCATT TGTAGGCAAC 84 0 

CCTTCTCTTA TTACAATACA TTTCTATGAC AATCCCATCC AATTTGTTGG GAGATCTGCT 900 

10 TTTCAACATT TACCTGAACT AAGAACACTG ACTCTGAATG GTGCCTCACA AATAACTGAA 96 0 

TTTCCTGATT TAACTGGAAC TGCAAACCTG GAGAGTCTGA CTTTAACTGG AGCACAGATC 102 0 

TCATCTCTTC CTCAAACCGT CTGCAATCAG TTACCTAATC TCCAAGTGCT AGATCTGTCT 108 0 

TACAACCTAT TAGAAGATTT AC CC AGTTTT TCAGTCTGCC AAAAGCTTCA GAAAATTGAC 114 0 

CTAAGACATA ATGAAATCTA CGAAATTAAA GTTGACACTT TCCAGCAGTT GCTTAGCCTC 12 00 

15 CGATCGCTGA ATTTGGCTTG GAACAAAATT GCTATTATTC ACCCCAATGC ATTTTCCACT 12 6 0 

TTGCCATCCC TAATAAAGCT GGACCTATCG TCCAACCTCC TGTCGTCTTT TCCTATAACT 13 2 0 

GGGTTACATG GTTTAACTCA CTTAAAATTA ACAGGAAATC ATGC CTTAC A GAGCTTGATA 13 8 0 

TCATCTGAAA ACTTTCCAGA ACTCAAGGTT ATAGAAATGC CTTATGCTTA CCAGTGCTGT 144 0 

GCATTTGGAG TGTGTGAGAA TGCCTATAAG ATTT CTAATC AATGGAATAA AGGTGACAAC 1500 

20 AGCAGTATGG ACG AC CTTC A TAAGAAAGAT GCTGGAATGT TTCAGGCTCA AGATGAACGT 156 0 

GACCTTGAAG ATTTCCTGCT TGACTTTGAG GAAGACCTGA AAGCCCTTCA TTCAGTGCAG 1620 

TGTTCACCTT CCCCAGGCCC CTTCAAACCC TGTGAACACC TGCTTGATGG CTGGCTGATC 168 0 

AGAATTGGAG TGTGGACCAT AGCAGTTCTG GCACTTACTT GTAATGCTTT GGTGACTTCA 1740 

ACAGTTTTCA GATCCCCTCT GTACATTTCC CCCATTAAAC TGTTAATTGG GGTCATCGCA 18 00 

25 GCAGTGAACA TGCTCACGGG AGTCTCCAGT GCCGTGCTGG CTGG TGTGG A TGCGTTCACT 1860 

TTTGGCAGCT TTG CACGAC A TGGTG CCTGG TGGGAGAATG GGG TTGGTTG CCATGTCATT 192 0 

GGTTTTTTGT CCATTTTTGC TTCAGAATCA TCTGTTTTCC TGCTTACTCT GGCAGCCCTG 198 0 

GAGCGTGGGT TCTCTGTGAA ATATTCTGCA AAATTTGAAA CGAAAGCTCC ATTTTCTAGC 2 04 0 



BNSDOCID: <WO 00221 29A1_I_> 



WO 00/22 1 29 PCT/US99/23938 

217 

CTGAAAGTAA TCATTTTGCT CTGTGCCCTG CTGGCCTTGA CCATGGCCGC AGTTCCCCTG 2100 

CTGGGTGGCA GCAAGTATGG CGCCTCCCCT CTCTGCCTGC CTTTGCCTTT TGGGGAGCCC 216 0 

AGCACCATGG GCTACATGGT CGCTCTCATC TTGCT CAATT CCCTTTGCTT CCTCATGATG 22 2 0 

ACCATTGCCT ACACCAAGCT CTACTGCAAT TTGGACAAGG GAGACCTGGA GAATATTTGG 22 8 0 

5 GACTGCTCTA TGGTAAAACA CATTGCCCTG TTGCTCTTCA CCAACTGCAT CCTAAACTGC 234 0 

CCTGTGGCTT TCTTGTCCTT CTCCTCTTTA ATAAACCTTA CATTTATCAG TCCTGAAGTA 24 00 

ATTAAGTTTA TCCTTCTGGT GGTAGTCCCA CTTCCTGCAT GTCTCAATCC CCTTCTCTAC 24 60 

ATCTTGTTCA ATCCTCACTT TAAGGAGGAT CTGGTGAGCC TGAGAAAGCA AACCTACGTC 2 520 

TGGACAAGAT CAAAACACCC AAG CTTGATG TCAATTAACT CTGATGATGT CGAAAAACAG 258 0 

0 TCCTGTGACT CAACTCAAGC CTTGGTAACC TTTACCAGCT CCAGCATCAC TTATGACCTG 2640 

CCTCCCAGTT CCGTGCCATC ACCAGCTTAT CCAGTGACTG AGAGCTGCCA TCTTTCCTCT 2 700 

GTGGCATTTG TCC C ATGTCT CTAA 2 724 
(265) INFORMATION FOR SEQ ID NO: 264: 

(i) SEQUENCE CHARACTERISTICS: 

5 (A) LENGTH: 9 07 amino acids 

(B) TYPE: amino acid 

(C) STRANDEDNESS : 

(D) TOPOLOGY: not relevant 

(ii) MOLECULE TYPE: protein 

0 (xi) SEQUENCE DESCRIPTION: SEQ ID NO:264: 

Met Asp Thr Ser Arg Leu Gly Val Leu Leu Ser Leu Pro Val Leu Leu 
15 10 15 

Gin Leu Ala Thr Gly Gly Ser Ser Pro Arg Ser Gly Val Leu Leu Arg 
20 25 30 

5 Gly Cys Pro Thr His Cys His Cys Glu Pro Asp Gly Arg Met Leu Leu 

35 40 45 

Arg Val Asp Cys Ser Asp Leu Gly Leu Ser Glu Leu Pro Ser Asn Leu 
50 55 60 

Ser Val Phe Thr Ser Tyr Leu Asp Leu Ser Met Asn Asn He Ser Gin 
0 65 70 75 80 

Leu Leu Pro Asn Pro Leu Pro Ser Leu Arg Phe Leu Glu Glu Leu Arg 
85 90 95 

Leu Ala Gly Asn Ala Leu Thr Tyr He Pro Lys Gly Ala Phe Thr Gly 
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ioo los no 

Leu Tyr Ser Leu Lys Val Leu Met Leu Gin Asn Asn Gin Leu Arg His 
115 120 125 

Val Pro Thr Glu Ala Leu Gin Asn Leu Arg Ser Leu Gin Ser Leu Arg 
13 ° 135 140 

Leu Asp Ala Asn His lie Ser Tyr Val Pro Pro Ser Cys Phe Ser Gly 
145 "0 155 160 

Leu His Ser Leu Arg His Leu Trp Leu Asp Asp Asn Ala Leu Thr Glu 
165 170 175 

lie Pro Val Gin Ala Phe Arg Ser Leu Ser Ala Leu Gin Ala Met Thr 
180 185 190 

Leu Ala Leu Asn Lys He His His He Pro Asp Tyr Ala Phe Gly Asn 
1^5 200 205 

Leu Ser Ser Leu Val Val Leu His Leu His Asn Asn Arg He His Ser 
15 210 215 220 

Leu Gly Lys Lys Cys Phe Asp Gly Leu His Ser Leu Glu Thr Leu Asp 
225 230 235 2 40 

Leu Asn Tyr Asn Asn Leu Asp Glu Phe Pro Thr Ala He Arg Thr Leu 
24 5 250 255 

20 Ser Asn Leu Lys Glu Leu Gly Phe His Ser Asn Asn He Arg Ser He 

260 265 270 

Pro Glu Lys Ala Phe Val Gly Asn Pro Ser Leu He Thr He His Phe 
275 280 285 

Tyr Asp Asn Pro He Gin Phe Val Gly Arg Ser Ala Phe Gin His Leu 
25 290 295 300 

Pro Glu Leu Arg Thr Leu Thr Leu Asn Gly Ala Ser Gin He Thr Glu 
305 310 315 320 

Phe Pro Asp Leu Thr Gly Thr Ala Asn Leu Glu Ser Leu Thr Leu Thr 
325 330 335 

Gly Ala Gin He Ser Ser Leu Pro Gin Thr Val Cys Asn Gin Leu Pro 
340 345 350 

Asn Leu Gin Val Leu Asp Leu Ser Tyr Asn Leu Leu Glu Asp Leu Pro 
355 360 365 

Ser Phe Ser Val Cys Gin Lys Leu Gin Lys He Asp Leu Arg His Asn 
370 375 380 

Glu He Tyr Glu He Lys Val Asp Thr Phe Gin Gin Leu Leu Ser Leu 
385 390 395 4 00 



30 



35 
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Arg Ser Leu Asn Leu Ala Trp Asn Lys lie Ala lie lie His Pro Asn 
405 410 415 

Ala Phe Ser Thr Leu Pro Ser Leu lie Lys Leu Asp Leu Ser Ser Asn 
420 425 430 

5 Leu Leu Ser Ser Phe Pro lie Thr Gly Leu His Gly Leu Thr His Leu 

435 440 445 

Lys Leu Thr Gly Asn His Ala Leu Gin Ser Leu He Ser Ser Glu Asn 
450 455 460 

Phe Pro Glu Leu Lys Val He Glu Met Pro Tyr Ala Tyr Gin Cys Cys 
0 465 470 475 480 

Ala Phe Gly Val Cys Glu Asn Ala Tyr Lys He Ser Asn Gin Trp Asn 
485 490 495 

Lys Gly Asp Asn Ser Ser Met Asp Asp Leu His Lys Lys Asp Ala Gly 
500 505 510 

5 Met Phe Gin Ala Gin Asp Glu Arg Asp Leu Glu Asp Phe Leu Leu Asp 

515 520 525 

Phe Glu Glu Asp Leu Lys Ala Leu His Ser Val Gin Cys Ser Pro Ser 
530 535 540 

Pro Gly Pro Phe Lys Pro Cys Glu His Leu Leu Asp Gly Trp Leu lie 
0 545 550 555 560 

Arg lie Gly Val Trp Thr He Ala Val Leu Ala Leu Thr Cys Asn Ala 
565 570 575 

Leu Val Thr Ser Thr Val Phe Arg Ser Pro Leu Tyr He Ser Pro He 
580 585 590 

Lys Leu Leu He Gly Val He Ala Ala Val Asn Met Leu Thr Gly Val 
595 600 605 

Ser Ser Ala Val Leu Ala Gly Val Asp Ala Phe Thr Phe Gly Ser Phe 
610 615 620 

Ala Arg His Gly Ala Trp Trp Glu Asn Gly Val Gly Cys His Val He 
625 630 635 640 

Gly Phe Leu Ser He Phe Ala Ser Glu Ser Ser Val Phe Leu Leu Thr 
645 650 655 

Leu Ala Ala Leu Glu Arg Gly Phe Ser Val Lys Tyr Ser Ala Lys Phe 
660 665 670 

Glu Thr Lys Ala Pro Phe Ser Ser Leu Lys Val He He Leu Leu Cys 
675 680 685 



Ala Leu Leu Ala Leu Thr Met Ala Ala Val Pro Leu Leu Gly Gly Ser 
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690 695 700 

Lys Tyr Gly Ala Ser Pro Leu Cys Leu Pro Leu Pro Phe Gly Glu Pro 
705 710 715 720 

Ser Thr Met Gly Tyr Met Val Ala Leu lie Leu Leu Asn Ser Leu Cys 
5 725 730 735 

Phe Leu Met Met Thr lie Ala Tyr Thr Lys Leu Tyr Cys Asn Leu Asp 
740 745 750 

Lys Gly Asp Leu Glu Asn lie Trp Asp Cys Ser Met Val Lys His lie 
755 760 765 

10 Ala Leu Leu Leu Phe Thr Asn Cys lie Leu Asn Cys Pro Val Ala Phe 

770 775 780 

Leu Ser Phe Ser Ser Leu lie Asn Leu Thr Phe He Ser Pro Glu Val 
785 790 795 800 

He Lys Phe lie Leu Leu Val Val Val Pro Leu Pro Ala Cys Leu Asn 
15 805 810 815 

Pro Leu Leu Tyr He Leu Phe Asn Pro His Phe Lys Glu Asp Leu Val 
820 825 830 

Ser Leu Arg Lys Gin Thr Tyr Val Trp Thr Arg Ser Lys His Pro Ser 
835 840 845 

20 Leu Met Ser He Asn Ser Asp Asp Val Glu Lys Gin Ser Cys Asp Ser 

850 855 860 

Thr Gin Ala Leu Val Thr Phe Thr Ser Ser Ser He Thr Tyr Asp Leu 
865 870 875 880 

Pro Pro Ser Ser Val Pro Ser Pro Ala Tyr Pro Val Thr Glu Ser Cys 
25 885 890 895 

His Leu Ser Ser Val Ala Phe Val Pro Cys Leu 
900 905 

(266) INFORMATION FOR SEQ ID NO: 265: 

(i) SEQUENCE CHARACTERISTICS: 
30 (A) LENGTH: 3 0 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : single 
<D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 

35 (xi) SEQUENCE DESCRIPTION: SEQ ID NO:265: 

CGGAAGCTGC GGGCCAAATG GGTGGCCGGC 30 
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(267) INFORMATION FOR SEQ ID NO: 26 6: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 2 7 base pairs 

(B) TYPE: nucleic acid 

5 (C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO:266: 

CAGAGGAGGG TGAAGGGGCT GTTGGCG 2 7 

10 (268) INFORMATION FOR SEQ ID NO:267: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 30 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 
15 (D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 26 7: 
GGCGGCGCCG AGCCAAGGGG CTGGCTGTGG 3 0 

(269) INFORMATION FOR SEQ ID NO:268: 

20 (i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 32 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

( D ) TOPOLOGY : 1 inear 

25 (ii) MOLECULE TYPE: DNA (genomic) 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO:268: 
GGGACTGCTC TATGAAAAAA CACATTGCCC TG 32 

(270) INFORMATION FOR SEQ ID NO: 269: 

(i) SEQUENCE CHARACTERISTICS: 
30 (A) LENGTH: 1071 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 

35 (xi) SEQUENCE DESCRIPTION: SEQ ID NO: 269: 

ATGAATGGGG TCTCGGAGGG GACCAGAGGC TGCAGTGACA GGCAACCTGG GGTCCTGACA 6 0 
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CGTGATCGCT CTTGTTCCAG GAAGATGAAC TCTTCCGGAT GCCTGTCTGA GGAGGTGGGG 12 0 

TCCCTCCGCC CACTGACTGT GGTTATCCTG TCTGCGTCCA TTGTCGTCGG AGTGCTGGGC 18 0 

AATGGGCTGG TGCTGTGGAT GACTGTCTTC CGTATGGCAC GCACGGTCTC CACCGTCTGC 24 0 

TTCTTCCACC TGGCCCTTGC CGATTTCATG CTCTCACTGT CTCTGCCCAT TGCCATGTAC 3 00 

5 TATATTGTCT CCAGGCAGTG GCTCCTCGGA GAGTGGGCCT GCAAACTCTA CATCACCTTT 360 

GTGTTCCTCA GCTACTTTGC CAGTAACTGC CTCCTTGTCT TCATCTCTGT GGACCGTTGC 42 0 

ATCTCTGTCC TCTACCCCGT CTGGGCCCTG AACCACCGCA CTGTGCAGCG GGCGAGCTGG 4 80 

CTGGCCTTTG GGGTGTGGCT CCTGGCCGCC GCCTTGTGCT CTGCGCACCT GAAATTCCGG 54 0 

ACAACCAGAA AATGGAATGG CTGTACGCAC TGCTACTTGG CGTTCAACTC TGACAATGAG 60 0 

10 ACTGCCCAGA TTTGGATTGA AGGGGTCGTG GAGGGACACA TTATAGGGAC CATTGGCCAC 66 0 

TTCCTGCTGG GCTTCCTGGG GCCCTTAGCA ATCATAGGCA CCTGCGCCCA CCTCATCCGG 72 0 

GCCAAGCTCT TGCGGGAGGG CTGGGTCCAT GCCAACCGGC CCAAGAGGCT GCTGCTGGTG 78 0 

CTGGTGAGCG CTTTCTTTAT CTTCTGGTCC CCGTTTAACG TGGTGCTGTT GGTCCATCTG 84 0 

TGGCGACGGG TGATGCTCAA GGAAATCTAC CACCCCCGGA TGCTGCTCAT CCTCCAGGCT 900 

15 AGCTTTGCCT TGGGCTGTGT CAACAGCAGC CTCAACCCCT TCCTCTACGT CTTCGTTGGC 960 

AGAGATTTCC AAGAAAAGTT TTTCCAGTCT TTGACTTCTG CCCTGGCGAG GGCGTTTGGA 102 0 

GAGGAGGAGT TTCTGTCATC CTGTCCCCGT GGCAACGCCC CCCGGGAATG A 1071 

(271) INFORMATION FOR SEQ ID NO: 270: 

(i) SEQUENCE CHARACTERISTICS: 
20 (A) LENGTH: 356 amino acids 

(B) TYPE: amino acid 

( C ) STRANDEDNESS : 

<D) TOPOLOGY: not relevant 

<ii) MOLECULE TYPE: protein 

25 (xi) SEQUENCE DESCRIPTION: SEQ ID NO:270: 

Met Asn Gly Val Ser Glu Gly Thr Arg Gly Cys Ser Asp Arg Gin Pro 
15 10 15 

Gly Val Leu Thr Arg Asp Arg Ser Cys Ser Arg Lys Met Asn Ser Ser 
20 25 30 

30 Gly Cys Leu Ser Glu Glu Val Gly Ser Leu Arg Pro Leu Thr Val Val 

35 40 45 
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He Leu Ser Ala Ser He Val Val Gly Val Leu Gly Asn Gly Leu Val 
50 55 60 

Leu Trp Met Thr Val Phe Arg Met Ala Arg Thr Val Ser Thr Val Cys 
65 70 75 80 

5 Phe Phe His Leu Ala Leu Ala Asp Phe Met Leu Ser Leu Ser Leu Pro 

85 90 95 

lie Ala Met Tyr Tyr lie Val Ser Arg Gin Trp Leu Leu Gly Glu Trp 
100 105 110 

Ala Cys Lys Leu Tyr lie Thr Phe Val Phe Leu Ser Tyr Phe Ala Ser 
10 115 120 125 

Asn Cys Leu Leu Val Phe He Ser Val Asp Arg Cys He Ser Val Leu 
130 135 140 

Tyr Pro Val Trp Ala Leu Asn His Arg Thr Val Gin Arg Ala Ser Trp 
145 150 155 160 

15 Leu Ala Phe Gly Val Trp Leu Leu Ala Ala Ala Leu Cys Ser Ala His 

165 170 175 

Leu Lys Phe Arg Thr Thr Arg Lys Trp Asn Gly Cys Thr His Cys Tyr 
180 185 190 

Leu Ala Phe Asn Ser Asp Asn Glu Thr Ala Gin He Trp He Glu Gly 
20 195 200 205 

Val Val Glu Gly His He He Gly Thr He Gly His Phe Leu Leu Gly 
210 215 220 

Phe Leu Gly Pro Leu Ala He He Gly Thr Cys Ala His Leu He Arg 
225 230 235 240 

25 Ala Lys Leu Leu Arg Glu Gly Trp Val His Ala Asn Arg Pro Lys Arg 

245 250 255 

Leu Leu Leu Val Leu Val Ser Ala Phe Phe He Phe Trp Ser Pro Phe 
260 265 270 

Asn Val Val Leu Leu Val His Leu Trp Arg Arg Val Met Leu Lys Glu 
30 275 280 285 

He Tyr His Pro Arg Met Leu Leu He Leu Gin Ala Ser Phe Ala Leu 
290 295 300 

Gly Cys Val Asn Ser Ser Leu Asn Pro Phe Leu Tyr Val Phe Val Gly 
305 310 315 320 

35 Arg Asp Phe Gin Glu Lys Phe Phe Gin Ser Leu Thr Ser Ala Leu Ala 

325 330 335 

Arg Ala Phe Gly Glu Glu Glu Phe Leu Ser Ser Cys Pro Arg Gly Asn 
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340 345 350 

Ala Pro Arg Glu 
355 

(272) INFORMATION FOR SEQ ID NO: 271: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 903 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 



10 

(ii) MOLECULE TYPE: DNA (genomic) 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO:271: 

ATGGACCTGC CCCCGCAGCT CTCCTTCGGC CTCTATGTGG CCGCCTTTGC GCTGGGCTTC 6 0 

CCGCTCAACG TCCTGGCCAT CCGAGGCGCG ACGGCCCACG CCCGGCTCCG TCTCACCCCT 12 0 

15 AGCCTGGTCT ACGCCCTGAA CCTGGGCTGC TCCGACCTGC TGCTGACAGT CTCTCTGCCC 180 

CTGAAGGCGG TGGAGGCGCT AGCCTCCGGG GCCTGGCCTC TGCCGGCCTC GCTGTGCCCC 24 0 

GTCTTCGCGG TGGCCCACTT CTTCCCACTC TATGCCGGCG GGGGCTTCCT GGCCGCCCTG 3 00 

AGTGCAGGCC GCTACCTGGG AGCAGCCTTC CCCTTGGGCT ACCAAGCCTT CCGGAGGCCG 360 

TGCTATTCCT GGGGGGTGTG CGCGGCCATC TGGGCCCTCG TCCTGTGTCA CCTGGGTCTG 42 0 

20 GTCTTTGGGT TGGAGGCTCC AGGAGGCTGG CTGG AC C AC A GCAACACCTC CCTGGGCATC 4 80 

AACACACCGG TCAACGGCTC TCCGGTCTGC CTGGAGGCCT GGGACCCGGC CTCTGCCGGC 54 0 

CCGGCCCGCT TCAGCCTCTC TCTCCTGCTC TTTTTTCTGC CCTTGGCCAT CACAGCCTTC 6 00 

TGCTACGTGG GCTGCCTCCG GGCACTGGCC CGCTCCGGCC TGACGCACAG GCGGAAGCTG 66 0 

CGGGCCAAAT GGGTGGCCGG CGGGGCCCTC CTCACGCTGC TGCTCTGCGT AGGACCCTAC 72 0 

25 AACGCCTCCA ACGTGGCCAG CTTCCTGTAC CCCAATCTAG GAGGCTCCTG GCGGAAGCTG 78 0 

GGGCTCATCA CGGGTGCCTG GAGTGTGGTG CTTAATCCGC TGGTGACCGG TTACTTGGGA 840 

AGGGGTCCTG GCCTGAAGAC AGTGTGTGCG GCAAGAACGC AAGGGGGCAA GTCCCAGAAG 900 
TAA 

(273) INFORMATION FOR SEQ ID NO:272: 

30 (i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 300 amino acids 

(B) TYPE: amino acid 

(C) STRANDEDNESS: 
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(D) TOPOLOGY: not relevant 
(ii) MOLECULE TYPE: protein 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO:272: 

Met Asp Leu Pro Pro Gin Leu Ser Phe Gly Leu Tyr Val Ala Ala Phe 
5 1 5 10 15 

Ala Leu Gly Phe Pro Leu Asn Val Leu Ala lie Arg Gly Ala Thr Ala 
20 25 30 

His Ala Arg Leu Arg Leu Thr Pro Ser Leu Val Tyr Ala Leu Asn Leu 
35 40 45 

10 Gly Cys Ser Asp Leu Leu Leu Thr Val Ser Leu Pro Leu Lys Ala Val 

50 55 60 

Glu Ala Leu Ala Ser Gly Ala Trp Pro Leu Pro Ala Ser Leu Cys Pro 
65 70 75 80 

Val Phe Ala Val Ala His Phe Phe Pro Leu Tyr Ala Gly Gly Gly Phe 
15 85 90 95 

Leu Ala Ala Leu Ser Ala Gly Arg Tyr Leu Gly Ala Ala Phe Pro Leu 
100 105 110 

Gly Tyr Gin Ala Phe Arg Arg Pro Cys Tyr Ser Trp Gly Val Cys Ala 
115 120 125 

20 Ala lie Trp Ala Leu Val Leu Cys His Leu Gly Leu Val Phe Gly Leu 

130 135 140 

Glu Ala Pro Gly Gly Trp Leu Asp His Ser Asn Thr Ser Leu Gly lie 
145 150 155 160 

Asn Thr Pro Val Asn Gly Ser Pro Val Cys Leu Glu Ala Trp Asp Pro 
25 165 170 175 

Ala Ser Ala Gly Pro Ala Arg Phe Ser Leu Ser Leu Leu Leu Phe Phe 
180 185 190 

Leu Pro Leu Ala lie Thr Ala Phe Cys Tyr Val Gly Cys Leu Arg Ala 
195 200 205 

30 Leu Ala Arg Ser Gly Leu Thr His Arg Arg Lys Leu Arg Ala Lys Trp 

210 215 220 

Val Ala Gly Gly Ala Leu Leu Thr Leu Leu Leu Cys Val Gly Pro Tyr 
225 230 235 240 

Asn Ala Ser Asn Val Ala Ser Phe Leu Tyr Pro Asn Leu Gly Gly Ser 
35 245 250 255 

Trp Arg Lys Leu Gly Leu lie Thr Gly Ala Trp Ser Val Val Leu Asn 
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260 265 270 

Pro Leu Val Thr Gly Tyr Leu Gly Arg Gly Pro Gly Leu Lys Thr Val 
275 280 285 

Cys Ala Ala Arg Thr Gin Gly Gly Lys Ser Gin Lys 
5 290 295 300 

(274) INFORMATION FOR SEQ ID NO:273: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 1041 base pairs 

(B) TYPE: nucleic acid 
10 (C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO:273: 

ATGGATACAG GCCCCGACCA GTCCTACTTC TCCGGCAATC ACTGGTTCGT CTTCTCGGTG 6 0 

15 TACCTTCTCA CTTTCCTGGT GGGGCTCCCC CTCAACCTGC TGGCCCTGGT GGTCTTCGTG 120 

GGCAAGCTGC AGCGCCGCCC GGTGGCCGTG GACGTGCTCC TGCTCAACCT GACCGCCTCG 18 0 

GACCTGCTCC TGCTGCTGTT CCTGCCTTTC CGCATGGTGG AGGCAGCCAA TGGCATGCAC 24 0 

TGGCCCCTGC CCTTCATCCT CTGCCCACTC TCTGGATTCA TCTTCTTCAC CACCATCTAT 3 00 

CTCACCGCCC TCTTCCTGGC AGCTGTGAGC ATTGAACGCT TCCTGAGTGT GGCCCACCCA 360 

20 CTG TGGTAC A AGACCCGGCC GAGGCTGGGG CAGGCAGGTC TGGTGAGTGT GGCCTGCTGG 420 

CTGTTGGCCT CTGCTCACTG CAGCGTGGTC TACGTCATAG AATTCTCAGG GGACATCTCC 480 

CACAGCCAGG GCACCAATGG GACCTGCTAC CTGGAGTTCC GGAAGGACCA GCTAGCCATC 54 0 

CTCCTGCCCG TGCGGCTGGA GATGGCTGTG GTCCTCTTTG TGGTCCCGCT GATCATCACC 6 00 

AGCTACTGCT ACAGCCGCCT GGTGTGGATC CTCGGCAGAG GGGGCAGCCA CCGCCGGCAG 66 0 

25 AGGAGGGTGA AGGGGCTGTT GGCGGCCACG CTGCTCAACT TCCTTGTCTG CTTTGGGCCC 72 0 

TACAACGTGT CCCATGTCGT GGGCTATATC TGCGGTGAAA GCCCGGCATG GAGGATCTAC 780 

GTGACGCTTC TCAGCACCCT GAACTCCTGT GTCGACCCCT TTGTCTACTA CTTCTCCTCC 84 0 

TCCGGGTTCC AAGCCGACTT TCATGAGCTG CTGAGGAGGT TGTGTGGGCT CTGGGGCCAG 900 

TGGCAGCAGG AGAGCAGCAT GGAGCTGAAG GAGCAGAAGG GAGGGGAGGA G CAG AG AG CG 960 

30 GACCGACCAG CTGAAAGAAA GACCAGTGAA CACTCACAGG GCTGTGGAAC TGGTGGCCAG 1020 

GTGGCCTGTG CTGAAAGCTA G 1041 
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(275) INFORMATION FOR SEQ ID NO:274: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH; 346 amino acids 

(B) TYPE: amino acid 
5 <C) STRANDEDNESS : 

(D) TOPOLOGY: not relevant 

(ii) MOLECULE TYPE: protein 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 2 74: 

Met Asp Thr Gly Pro Asp Gin Ser Tyr Phe Ser Gly Asn His Trp Phe 
10 1 5 10 15 

Val Phe Ser Val Tyr Leu Leu Thr Phe Leu Val Gly Leu Pro Leu Asn 
20 25 30 

Leu Leu Ala Leu Val Val Phe Val Gly Lys Leu Gin Arg Arg Pro Val 
35 40 45 

15 Ala Val Asp Val Leu Leu Leu Asn Leu Thr Ala Ser Asp Leu Leu Leu 

50 55 60 

Leu Leu Phe Leu Pro Phe Arg Met Val Glu Ala Ala Asn Gly Met His 
65 70 75 80 

Trp Pro Leu Pro Phe lie Leu Cys Pro Leu Ser Gly Phe lie Phe Phe 
20 85 90 95 

Thr Thr lie Tyr Leu Thr Ala Leu Phe Leu Ala Ala Val Ser lie Glu 
100 105 no 

Arg Phe Leu Ser Val Ala His Pro Leu Trp Tyr Lys Thr Arg Pro Arg 
115 120 125 

25 Leu Gly Gin Ala Gly Leu Val Ser Val Ala Cys Trp Leu Leu Ala Ser 

130 135 140 

Ala His Cys Ser Val Val Tyr Val lie Glu Phe Ser Gly Asp lie Ser 
145 150 155 160 

His Ser Gin Gly Thr Asn Gly Thr Cys Tyr Leu Glu Phe Arg Lys Asp 
30 165 170 175 

Gin Leu Ala lie Leu Leu Pro Val Arg Leu Glu Met Ala Val Val Leu 
180 185 190 

Phe Val Val Pro Leu lie lie Thr Ser Tyr Cys Tyr Ser Arg Leu Val 
195 200 205 

35 Trp lie Leu Gly Arg Gly Gly Ser His Arg Arg Gin Arg Arg Val Lys 

210 215 220 

Gly Leu Leu Ala Ala Thr Leu Leu Asn Phe Leu Val Cys Phe Gly Pro 
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225 230 235 240 

Tyr Asn Val Ser His Val Val Gly Tyr lie Cys Gly Glu Ser Pro Ala 
245 250 255 

Trp Arg lie Tyr Val Thr Leu Leu Ser Thr Leu Asn Ser Cys Val Asp 
5 260 265 270 

Pro Phe Val Tyr Tyr Phe Ser Ser Ser Gly Phe Gin Ala Asp Phe His 
275 280 285 

Glu Leu Leu Arg Arg Leu Cys Gly Leu Trp Gly Gin Trp Gin Gin Glu 
290 295 300 

10 Ser Ser Met Glu Leu Lys Glu Gin Lys Gly Gly Glu Glu Gin Arg Ala 

305 310 315 320 

Asp Arg Pro Ala Glu Arg Lys Thr Ser Glu His Ser Gin Gly Cys Gly 
325 330 335 

Thr Gly Gly Gin Val Ala Cys Ala Glu Ser 
15 340 345 

(276) INFORMATION FOR SEQ ID NO:275: 

(i) SEQUENCE CHARACTERISTICS: 
(A) LENGTH: 993 base pairs 
<B) TYPE: nucleic acid 

20 (C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 275: 

ATGCTGCCGG ACTGGAAGAG CTCCTTGATC CTCATGGCTT ACATCATCAT CTTCCTCACT 6 0 

25 GGCCTCCCTG CCAACCTCCT GGCCCTGCGG GCCTTTGTGG GGCGGATCCG CCAGCCCCAG 12 0 

CCTGCACCTG TGCACATCCT CCTGCTGAGC CTGACGCTGG CCGACCTCCT CCTGCTGCTG 180 

CTGCTGCCCT TCAAGATCAT CGAGGCTGCG TCGAACTTCC GCTGGTACCT GCCCAAGGTC 240 

GTCTGCGCCC TCACGAGTTT TGGCTTCTAC AGCAGCATCT ACTGCAGCAC GTGGCTCCTG 3 00 

GCGGGCATCA GCATCGAGCG CTACCTGGGA GTGGCTTTCC CCGTGCAGTA CAAGCTCTCC 360 

30 CGCCGGCCTC TGTATGGAGT GATTGCAGCT CTGGTGGCCT GGGTTATGTC CTTTGGTCAC 420 

TGCACCATCG TGATCATCGT TCAATACTTG AACACGACTG AGCAGGTCAG AAGTGGCAAT 4 80 

GAAATTACCT GCTACGAGAA CTTCACCGAT AACCAGTTGG ACGTGGTGCT GCCCGTGCGG 540 

CTGGAGCTGT GCCTGGTGCT CTTCTTCATC CCCATGGCAG TCACCATCTT CTG CTACTGG 6 00 
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CGTTTTGTGT GGATCATGCT CTCCCAGCCC CTTGTGGGGG CCCAGAGGCG GCGCCGAGCC 66 0 

AAGGGGCTGG CTGTGGTGAC GCTGCTCAAT TTCCTGGTGT GCTTCGGACC TTACAACGTG 72 0 

TCCCACCTGG TGGGGTATCA CCAGAGAAAA AGCCCCTGGT GGCGGTCAAT AGCCGTGGTG 78 0 

TTCAGTTCAC TCAACGCCAG TCTGGACCCC CTGCTCTTCT ATTTCTCTTC TTCAGTGGTG 84 0 

5 CGCAGGGCAT TTGGGAGAGG GCTGCAGGTG CTGCGGAATC AGGGCTCCTC CCTGTTGGGA 900 

CGCAGAGGCA AAGACACAGC AGAGGGGACA AATGAGGACA GGGGTGTGGG TCAAGGAGAA 960 

GGGATGCCAA GTTCGGACTT CACTACAGAG TAG 993 
(277) INFORMATION FOR SEQ ID NO: 276: 

(i) SEQUENCE CHARACTERISTICS: 
10 (A) LENGTH: 33 0 amino acids 

(Bj TYPE: amino acid 

( C ) STRANDEDNESS : 

(D) TOPOLOGY: not relevant 

(ii) MOLECULE TYPE: protein 

15 (xi) SEQUENCE DESCRIPTION: SEQ ID NO: 276: 

Met Leu Pro Asp Trp Lys Ser Ser Leu lie Leu Met Ala Tyr lie lie 
1 5 10 15 

lie Phe Leu Thr Gly Leu Pro Ala Asn Leu Leu Ala Leu Arg Ala Phe 
20 25 30 

20 Val Gly Arg lie Arg Gin Pro Gin Pro Ala Pro Val His lie Leu Leu 

35 40 45 

Leu Ser Leu Thr Leu Ala Asp Leu Leu Leu Leu Leu Leu Leu Pro Phe 
50 55 60 

Lys lie lie Glu Ala Ala Ser Asn Phe Arg Trp Tyr Leu Pro Lys Val 
25 65 70 75 80 

Val Cys Ala Leu Thr Ser Phe Gly Phe Tyr Ser Ser lie Tyr Cys Ser 
85 90 95 

Thr Trp Leu Leu Ala Gly lie Ser lie Glu Arg Tyr Leu Gly Val Ala 
100 105 110 

30 Phe Pro Val Gin Tyr Lys Leu Ser Arg Arg Pro Leu Tyr Gly Val lie 

115 120 125 

Ala Ala Leu Val Ala Trp Val Met Ser Phe Gly His Cys Thr lie Val 
130 135 140 

lie lie Val Gin Tyr Leu Asn Thr Thr Glu Gin Val Arg Ser Gly Asn 
35 145 150 155 160 
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Glu lie Thr Cys Tyr Glu Asn Phe Thr Asp Asn Gin Leu Asp Val Val 
165 170 175 

Leu Pro Val Arg Leu Glu Leu Cys Leu Val Leu Phe Phe He Pro Met 
18C 185 190 

Ala Val Thr He Phe Cys Tyr Trp Arg Phe Val Trp He Met Leu Ser 
195 200 205 

Gin Pro Leu Val Gly Ala Gin Arg Arg Arg Arg Ala Lys Gly Leu Ala 
210 215 220 

Val Val Thr Leu Leu Asn Phe Leu Val Cys Phe Gly Pro Tyr Asn Val 
225 230 235 240 

Ser His Leu Val Gly Tyr His Gin Arg Lys Ser Pro Trp Trp Arg Ser 
245 250 255 

He Ala Val Val Phe Ser Ser Leu Asn Ala Ser Leu Asp Pro Leu Leu 
260 265 270 

Phe Tyr Phe Ser Ser Ser Val Val Arg Arg Ala Phe Gly Arg Gly Leu 
275 280 285 

Gin Val Leu Arg Asn Gin Gly Ser Ser Leu Leu Gly Arg Arg Gly Lys 
290 295 300 

Asp Thr Ala Glu Gly Thr Asn Glu Asp Arg Gly Val Gly Gin Gly Glu 
305 310 315 320 

Gly Met Pro Ser Ser Asp Phe Thr Thr Glu 
325 330 

(278) INFORMATION FOR SEQ ID NO: 277: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 2 724 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : single 

( D ) TOPOLOGY : 1 inear 

(ii) MOLECULE TYPE: DNA (genomic) 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO:277: 

ATGGACACCT CCCGGCTCGG TGTGCTCCTG TCCTTGCCTG TGCTGCTGCA GCTGGCGACC 60 

GGGGGCAGCT CTCCCAGGTC TGGTGTGTTG CTGAGGGGCT GCCCCACACA CTGTCATTGC 12 0 

GAG CCCGACG GCAGGATGTT GCTCAGGGTG GACTGCTCCG ACCTGGGGCT CTCGGAGCTG 18 0 

CCTTCCAACC TCAGCGTCTT CACCTCCTAC CTAGACCTCA GTATGAACAA CATCAGTCAG 240 

CTGCTCCCGA ATCCCCTGCC CAGTCTCCGC TTCCTGGAGG AGTTACGTCT TGCGGGAAAC 3 00 
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GCTCTGACAT ACATTCCCAA GGGAGCATTC 
CTGCAGAATA ATCAGCTAAG ACACGTACCC 
CAATCCCTGC GTCTGGATGC TAACCACATC 
CTGCATTCCC TGAGGCACCT GTGGCTGGAT 
5 GCTTTTAGAA GTTTATCGGC ATTGCAAGCC 
ATACCAGACT ATGC CTTTGG AAACCTCTCC 
AGAATCCACT CCCTGGGAAA GAAATGCTTT 
TTAAATTACA ATAACCTTGA TGAATTCCCC 
GAACTAGGAT TTCATAGCAA CAATATCAGG 

10 CCTTCTCTTA TTACAATACA TTTCTATGAC 
TTTCAACATT TACCTGAACT AAGAACACTG 
TTTCCTGATT TAACTGGAAC TGCAAACCTG 
TCATCTCTTC CTCAAACCGT CTGCAATCAG 
TACAACCTAT TAGAAGATTT ACCCAGTTTT 

15 CTAAGACATA ATGAAATCTA CGAAATTAAA 
CGATCGCTGA ATTTGGCTTG GAACAAAATT 
TTGCCATCCC TAATAAAGCT GGAC CTATCG 
GGGTTACATG GTTTAACTCA CTTAAAATTA 
TCATCTGAAA ACTTTCCAGA ACTCAAGGTT 

20 GCATTTGGAG TGTGTGAGAA TGCCTATAAG 
AG CAGTATGG ACGACCTTCA TAAGAAAGAT 
GACCTTGAAG ATTTCCTGCT TGACTTTGAG 
TGTTCACCTT CCCCAGGCCC CTTCAAACCC 
AGAATTGGAG TGTG G AC CAT AGCAGTTCTG 

25 ACAGTTTTCA GATCCCCTCT GTACATTTCC 
GCAGTGAACA TGCTCACGGG AGTCTCCAGT 
TTTGGCAGCT TTGCACGACA TGGTGCCTGG 
GGTTTTTTGT CCATTTTTGC TTCAGAATCA 
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ACTGGCCTTT ACAGTCTTAA AGTTCTTATG 36 0 

ACAGAAGCTC TGCAGAATTT GCGAAGCCTT 420 

AG CTATGTGC CCCCAAGCTG TTTCAGTGGC 4C0 

GACAATGCGT TAACAGAAAT CCCCGTCCAG 54 0 

ATGACCTTGG CCCTGAACAA AATACACCAC 600 

AG CTTGGT AG TTCTACATCT CCATAACAAT 66 0 

GATGGGCTCC ACAGCCTAGA GACTTTAGAT 72 0 

ACTGCAATTA GGACACTCTC CAACCTTAAA 780 

TCGATACCTG AGAAAGCATT TGTAGGCAAC 84 0 

AATCCCATCC AATTTGTTGG GAGATCTGCT 900 

ACTCTGAATG GTGCCTCACA AATAACTGAA 96 0 

GAGAGTCTGA CTTTAACTGG AG CAC AGATC 1020 

TTACCTAATC TCCAAGTGCT AGATCTGTCT 1080 

TCAGTCTGCC AAAAGCTTCA GAAAATTGAC 114 0 

GTTGACACTT TCCAGCAGTT GCTTAGCCTC 1200 

GCTATTATTC ACCCCAATGC ATTTTCCACT 1260 

TCCAACCTCC TGTCGTCTTT TCCTATAACT 132 0 

ACAGGAAATC ATGCCTTACA GAGCTTGATA 13 8 0 

AT AG AAATG C CTTATGCTTA CCAGTGCTGT 144 0 

ATTTCTAATC AATGGAATAA AGGTGACAAC 1500 

GCTGGAATGT TTCAGGCTCA AGATGAACGT 1560 

GAAGACCTGA AAGCCCTTCA TTCAGTGCAG 162 0 

TGTGAACACC TGCTTGATGG CTGGCTGATC 16 8 0 

GCACTTACTT GTAATG CTTT GGTGACTTCA 174 0 

CCCATTAAAC TGTTAATTGG GGTCATCGCA 18 00 

GCCGTGCTGG CTGGTGTGGA TG CGTTC ACT 1860 

TGGGAGAATG GGGTTGGTTG CCATGTCATT .1920 

TCTGTTTTCC TGCTTACTCT GGCAGCCCTG 198 0 
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n Zi n rp tc r* r* t 

\jJ-\.<JK-\J 1 VjtjJ^J 1 


1 L, 1L 1 G 1 bAA 


ATATTCTGCA 


AAATTTGAAA 


CGAAAGCTCC 


ATTTTCTAGC 


2040 


/^~"T > ^~' A A A (^""P A A 
L- 1 VJ A/\/\L3 1 AA 


1 LAI 111 GG 1 


CTGTGCCCTG 


CTGGCCTTGA 


CCATGGCCGC 


AGTTCCCCTG 


2100 




GL, AAG I A 1 GG 


CGCCTCCCCT 


CTCTGCCTGC 


CTTTGCCTTT 


TGGGGAGCCC 


2160 




GG 1ALA1 GG 1 


LbL rCTCATC 


TTGCTCAATT 


CCCTTTGCTT 


CCTCATGATG 


2220 




ALALLAAbL 1 


G T AC TG CAAT 


TTGGACAAGG 


GAGACCTGGA 


GAATATTTGG 


2280 


GAL. i GL- ILIA 


TGAAAAAACA 


CATTGCCCTG 


TTGCTCTTCA 


CCAACTGCAT 


CCTAAACTGC 


2340 


G L, 1 G 1 GGC I I 


TCTTGTCCTT 


CTCCTCTTTA 


ATAAACCTTA 


CATTTATCAG 


TCCTGAAGTA 


2400 


ATTAAGTTTA 


TCCTTCTGGT 


GGTAGTCCCA 


CTTCCTGCAT 


GTCTCAATCC 


CCTTCTCTAC 


2460 


ATCTTGTTCA 


ATCCTCACTT 


TAAGGAGGAT 


CTGGTGAGCC 


TGAGAAAGCA 


AACCTACGTC 


2520 


TGGACAAGAT 


CAAAACACCC 


AAGCTTGATG 


TCAATTAACT 


CTGATGATGT 


CGAAAAACAG 


2580 


TCCTGTGACT 


CAACTCAAGC 


CTTGGTAACC 


TTTACCAGCT 


CCAGCATCAC 


TTATGACCTG 


2640 


CCTCCCAGTT 


CCGTGCCATC 


ACCAGCTTAT 


CCAGTGACTG 


AGAGCTGCCA 


TCTTTCCTCT 


2700 


GTGGCATTTG 


TCCCATGTCT 


CTAA 








2724 


(279) INFORMATION FOR 


SEQ ID NO : 2 78 : - 









(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 90 7 amino acids 

(B) TYPE: amino acid 

(C) STRANDEDNESS : 

(D) TOPOLOGY: not relevant 

(ii) MOLECULE TYPE: protein 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO:278: 
Met Asp Thr Ser Arg Leu Gly Val Leu Leu Ser Leu Pro Val Leu Leu 



1 



5 



10 



15 



Gin Leu Ala Thr Gly Gly Ser Ser 
20 



Pro Arg Ser Gly Val Leu Leu Arg 
25 30 



Gly Cys Pro Thr His Cys His Cys 
35 40 



Glu Pro Asp Gly Arg Met Leu Leu 
45 



Arg Val Asp Cys Ser Asp Leu Gly 
50 55 



Leu Ser Glu Leu Pro Ser Asn Leu 
60 



Ser Val Phe Thr Ser Tyr Leu Asp 
65 70 



Leu Ser Met Asn Asn He Ser Gin 
75 so 



Leu Leu Pro Asn Pro Leu Pro Ser 
85 



Leu Arg Phe Leu Glu Glu Leu Arg 
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Leu Ala Gly Asn Ala Leu Thr Tyr lie Pro Lys Gly Ala Phe Thr Gly 
100 105 110 

Leu Tyr Ser Leu Lys Val Leu Met Leu Gin Asn Asn Gin Leu Arg His 
115 120 125 

5 Val Pro Thr Glu Ala Leu Gin Asn Leu Arg Ser Leu Gin Ser Leu Arg 

130 135 140 

Leu Asp Ala Asn His lie Ser Tyr Val Pro Pro Ser Cys Phe Ser Gly 
145 150 155 160 

Leu His Ser Leu Arg His Leu Trp Leu Asp Asp Asn Ala Leu Thr Glu 
10 165 170 175 

lie Pro Val Gin Ala Phe Arg Ser Leu Ser Ala Leu Gin Ala Met Thr 
180 185 190 

Leu Ala Leu Asn Lys lie His His lie Pro Asp Tyr Ala Phe Gly Asn 
195 200 205 

15 Leu Ser Ser Leu Val Val Leu His Leu His Asn Asn Arg lie His Ser 

210 215 220 

Leu Gly Lys Lys Cys Phe Asp Gly Leu His Ser Leu Glu Thr Leu Asp 
225 230 235 240 

Leu Asn Tyr Asn Asn Leu Asp Glu Phe Pro Thr Ala lie Arg Thr Leu 
20 245 250 255 

Ser Asn Leu Lys Glu Leu Gly Phe His Ser Asn Asn lie Arg Ser lie 
260 265 270 

Pro Glu Lys Ala Phe Val Gly Asn Pro Ser Leu lie Thr lie His Phe 
275 280 285 

25 Tyr Asp Asn Pro lie Gin Phe Val Gly Arg Ser Ala Phe Gin His Leu 

290 295 300 

Pro Glu Leu Arg Thr Leu Thr Leu Asn Gly Ala Ser Gin lie Thr Glu 
305 310 315 320 

Phe Pro Asp Leu Thr Gly Thr Ala Asn Leu Glu Ser Leu Thr Leu Thr 
30 325 330 335 

Gly Ala Gin lie Ser Ser Leu Pro Gin Thr Val Cys Asn Gin Leu Pro 
340 345 350 

Asn Leu Gin Val Leu Asp Leu Ser Tyr Asn Leu Leu Glu Asp Leu Pro 
355 360 365 

35 Ser Phe Ser Val Cys Gin Lys Leu Gin Lys lie Asp Leu Arg His Asn 

370 375 380 

Glu lie Tyr Glu lie Lys Val Asp Thr Phe Gin Gin Leu Leu Ser Leu 
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385 390 395 40 0 

Arg Ser Leu Asn Leu Ala Trp Asn Lys lie Ala lie He His Pro Asn 
405 410 415 

Ala Phe Ser Thr Leu Pro Ser Leu He Lys Leu Asp Leu Ser Ser Asn 
420 425 430 

Leu Leu Ser Ser Phe Pro He Thr Gly Leu His Gly Leu Thr His Leu 
435 440 445 

Lys Leu Thr Gly Asn His Ala Leu Gin Ser Leu He Ser Ser Glu Asn 
450 455 460 

Phe Pro Glu Leu Lys Val He Glu Met Pro Tyr Ala Tyr Gin Cys Cys 
465 470 475 48 0 

Ala Phe Gly Val Cys Glu Asn Ala Tyr Lys He Ser Asn Gin Trp Asn 
485 490 495 

Lys Gly Asp Asn Ser Ser Met Asp Asp Leu His Lys Lys Asp Ala Gly 
500 505 510 

Met Phe Gin Ala Gin Asp Glu Arg Asp Leu Glu Asp Phe Leu Leu Asp 
515 520 525 

Phe Glu Glu Asp Leu Lys Ala Leu His Ser Val Gin Cys Ser Pro Ser 
530 535 540 

Pro Gly Pro Phe Lys Pro Cys Glu His Leu Leu Asp Gly Trp Leu He 
545 550 555 560 

Arg He Gly Val Trp Thr He Ala Val Leu Ala Leu Thr Cys Asn Ala 
565 570 575 

Leu Val Thr Ser Thr Val Phe Arg Ser Pro Leu Tyr He Ser Pro He 
580 585 590 

Lys Leu Leu He Gly Val He Ala Ala Val Asn Met Leu Thr Gly Val 
595 600 605 

Ser Ser Ala Val Leu Ala Gly Val Asp Ala Phe Thr Phe Gly Ser Phe 
610 615 620 

Ala Arg His Gly Ala Trp Trp Glu Asn Gly Val Gly Cys His Val He 
62 5 630 635 640 

Gly Phe Leu Ser He Phe Ala Ser Glu Ser Ser Val Phe Leu Leu Thr 
645 650 655 

Leu Ala Ala Leu Glu Arg Gly Phe Ser Val Lys Tyr Ser Ala Lys Phe 
660 665 670 



Glu Thr Lys Ala Pro Phe Ser Ser Leu Lys Val He He Leu Leu Cys 
675 680 685 
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Ala Leu Leu Ala Leu Thr Met Ala Ala Val Pro Leu Leu Gly Gly Ser 
690 695 700 

Lys Tyr Gly Ala Ser Pro Leu Cys Leu Pro Leu Pro Phe Gly Glu Pro 
705 710 715 720 

5 Ser Thr Met Gly Tyr Met Val Ala Leu lie Leu Leu Asn Ser Leu Cys 

725 730 735 

Phe Leu Met Met Thr lie Ala Tyr Thr Lys Leu Tyr Cys Asn Leu Asp 
740 745 750 

Lys Gly Asp Leu Glu Asn lie Trp Asp Cys Ser Met Lys Lys His lie 
10 755 760 765 

Ala Leu Leu Leu Phe Thr Asn Cys lie Leu Asn Cys Pro Val Ala Phe 
770 775 780 

Leu Ser Phe Ser Ser Leu lie Asn Leu Thr Phe lie Ser Pro Glu Val 
785 790 795 800 

15 lie Lys Phe lie Leu Leu Val Val Val Pro Leu Pro Ala Cys Leu Asn 

805 810 815 

Pro Leu Leu Tyr lie Leu Phe Asn Pro His Phe Lys Glu Asp Leu Val 
820 825 830 

Ser Leu Arg Lys Gin Thr Tyr Val Trp Thr Arg Ser Lys His Pro Ser 
20 835 840 845 

Leu Met Ser lie Asn Ser Asp Asp Val Glu Lys Gin Ser Cys Asp Ser 
850 855 860 

Thr Gin Ala Leu Val Thr Phe Thr Ser Ser Ser lie Thr Tyr Asp Leu 
865 870 875 880 

25 Pro Pro Ser Ser Val Pro Ser Pro Ala Tyr Pro Val Thr Glu Ser Cys 

885 890 895 

His Leu Ser Ser Val Ala Phe Val Pro Cys Leu 
900 905 

(280) INFORMATION FOR SEQ ID NO: 279: 

30 (i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 3 2 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS ; single 

(D) TOPOLOGY: linear 

35 (ii) MOLECULE TYPE: DNA (genomic) 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 2 79: 
CATGCCAACC GGCCCGCGAG GCTGCTGCTG GT 32 
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(281) INFORMATION FOR SEQ ID NO: 28 0: 

( i ) S EQUENCE CHARACTER ISTICS : 

(A) LENGTH: 32 base pairs 

(B) TYPE: nucleic acid 

5 (C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO:280: 

ACCAGCAGCA GCCTCGCGGG CCGGTTGGCA TG 
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